CT_00_027 -- Improve ceil/round code generation in GCC
About
Investigation of the 538.imagick benchmark from spec2017 shows that uses ceil/floor routines heavily. While the Zfa extension can be used to optimize these calls into simple FP conversion instructions, it is believed that an alternate implementation based on just the F/D conversions can be implemented which will significantly improve performance on designs that do not implement Zfa. It appears to be roughly a 9-10% improvement in the dynamic instruction count, but a 17% cycle improvement for 538.imagick.
LLVM already has this optimization in place.
Stakeholders/Partners
RISE:
Ventana: Jivan Hakobyan (under contract via RAU) – lead developer
Ventana: Jeff Law – general oversight
Rivos: ADLR – provided initial hint & data showing the extent of this problem
External:
Dependencies
Status
Development | COMPLETE |
|
|---|---|---|
Development Timeline | 1H2024 |
|
Upstreaming | COMPLETE |
|
Upstream Version | gcc-15 (target) Spring 2025
|
|
Contacts | Jeff Law (Ventana) |
|
Dependencies | None |
|
Updates
May 9, 2024
Jivan's patch has been upstreamed.
Mar 20, 2024
Patch posted upstream. Seems to have general consensus to go forward pending final review when gcc-15 is open for devleopment
Mar 17, 2024
Added note about actual performance improvement seen (17%).
Jivan has been asked to post his patch to gcc-patches list for review
Mar 13, 2024
An implementation borrowing heavily from LLVM is under evaluation. This implementation implements more efficient versions of ceil, round, nearbyint, rint, etc
In addition to using existing conversions to implement those functions, the implementation also includes sign extension removal for cases where the ultimate result is a GPR
Expecting this to save 300-400 billion instructions for imagick benchmark. Enough that we expect to see a measurable (perhaps double digit) improvement in the benchmark's overall performanc
Jan 29, 2024
Project reported as a priority for 1H2024