Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About

The cam4 benchmark in spec2017 makes heavy use of complex double precision division which is implemented within the libgcc library.  Complex division can be incredibly expensive due to the long latency, non-pipelineable, division operations and various special cases to deal with boundary conditions.

By using "-fcx-limited-range" when compiling the benchmarks, the compiler can open code the complex division and ignore many of the corner cases, significantly improving performance.  This is considered safe for the spec2017 suite and just needs to be tested and verified.

My recollection is this only affected the speed, not the rate runs of cam4, but this should be verified.

Note that RISC-V does not have a reciprocal estimator, so we can't turn the divisions into reciprocal multiplications, but even so this should significantly improve performanceis reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 17%.   Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.


Stakeholders/Partners

RISE:

Ventana: Jeff Law


External:



Dependencies


Status

Page Properties


Development

Status
colourRed
titleNOT STARTED


Development TimelineNA
Upstreaming

Status
colourRed
titleNOT STARTED


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

None




Updates

 

  • Project reported as a priority for 1H2024

...