About
The cam4 benchmark in spec2017 makes heavy use of complex double precision division which is implemented within the libgcc library. Complex division can be incredibly expensive due to the long latency, non-pipelineable, division operations and various special cases to deal with boundary conditions.
By using "-fcx-limited-range" when compiling the benchmarks, the compiler can open code the complex division and ignore many of the corner cases, significantly improving performance. This is considered safe for the spec2017 suite and just needs to be tested and verified.
My recollection is this only affected the speed, not the rate runs of cam4, but this should be verified.
Note that RISC-V does not have a reciprocal estimator, so we can't turn the divisions into reciprocal multiplications, but even so this should significantly improve performance.
Stakeholders/Partners
RISE:
Ventana: Jeff Law
External:
Dependencies
Status
Updates
- Project reported as a priority for 1H2024