About

Investigation of the 538.imagick benchmark from spec2017 shows that uses ceil/floor routines heavily. While the Zfa extension can be used to optimize these calls into simple FP conversion instructions, it is believed that an alternate implementation based on just the F/D conversions can be implemented which will significantly improve performance on designs that do not implement Zfa. It appears to be roughly a 9-10% improvement in the dynamic instruction count, but a 17% cycle improvement for 538.imagick.

LLVM already has this optimization in place.

Stakeholders/Partners

RISE:

Ventana: Jivan Hakobyan (under contract via RAU) – lead developer

Ventana: Jeff Law – general oversight

Rivos: ADLR – provided initial hint & data showing the extent of this problem

External:

Dependencies

Status

Development	COMPLETE
Development Timeline	1H2024
Upstreaming	COMPLETE
Upstream Version	gcc-15 (target) Spring 2025
Contacts	Jeff Law (Ventana)
Dependencies	None

Updates

09 May 2024

Jivan's patch has been upstreamed.

20 Mar 2024

Patch posted upstream. Seems to have general consensus to go forward pending final review when gcc-15 is open for devleopment

17 Mar 2024

Added note about actual performance improvement seen (17%).
Jivan has been asked to post his patch to gcc-patches list for review

13 Mar 2024

An implementation borrowing heavily from LLVM is under evaluation. This implementation implements more efficient versions of ceil, round, nearbyint, rint, etc
In addition to using existing conversions to implement those functions, the implementation also includes sign extension removal for cases where the ultimate result is a GPR
Expecting this to save 300-400 billion instructions for imagick benchmark. Enough that we expect to see a measurable (perhaps double digit) improvement in the benchmark's overall performanc

29 Jan 2024

Project reported as a priority for 1H2024

CT_00_027 -- Improve ceil/round code generation in GCC