CT_00_027 -- Improve ceil/round code generation in GCC
About
Investigation of the 538.imagick benchmark from spec2017 shows that uses ceil/floor routines heavily. While the Zfa extension can be used to optimize these calls into simple FP conversion instructions, it is believed that an alternate implementation based on just the F/D conversions can be implemented which will significantly improve performance on designs that do not implement Zfa. It appears to be roughly a 9-10% improvement in the dynamic instruction count, but a 17% cycle improvement for 538.imagick.
LLVM already has this optimization in place.
Stakeholders/Partners
RISE:
Ventana: Jivan Hakobyan (under contract via RAU) – lead developer
Ventana: Jeff Law – general oversight
Rivos: ADLR – provided initial hint & data showing the extent of this problem
External:
Dependencies
Status
Updates
- Jivan's patch has been upstreamed.
- Patch posted upstream. Seems to have general consensus to go forward pending final review when gcc-15 is open for devleopment
- Added note about actual performance improvement seen (17%).
- Jivan has been asked to post his patch to gcc-patches list for review
- An implementation borrowing heavily from LLVM is under evaluation. This implementation implements more efficient versions of ceil, round, nearbyint, rint, etc
- In addition to using existing conversions to implement those functions, the implementation also includes sign extension removal for cases where the ultimate result is a GPR
- Expecting this to save 300-400 billion instructions for imagick benchmark. Enough that we expect to see a measurable (perhaps double digit) improvement in the benchmark's overall performanc
- Project reported as a priority for 1H2024