About

The WRF benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 40%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

Stakeholders/Partners

RISE:

Ventana: Jeff Law

External:

Dependencies

Status

Development	COMPLETE
Development Timeline	1H2024
Upstreaming	COMPLETED
Upstream Version	gcc-14 Spring 2024
Contacts	Robin Dapp (Ventana) Jeff Law (Ventana)
Dependencies	Closure needs performance testing

Updates

14 Mar 2024

Currently seeing a 46% reduction in dynamic instructions
- Actual improvement from vectorization seen on x86_64 – 37%
- Actual improvement from vectorization seen on aarch64 – 37%
- Again, we're counting dynamic instructions on RISC-V and actual improvement on the competitive architectures
- Need to have a dynamic instruction count improvements at or better than the real improvement seen on the competitive architectures
- Conclusion: Hitting the mark for this phase. Next step is to verify performance on real hardware

29 Dec 2023

Project reported as a priority for 1H2024

CT_00_016 -- Vectorize wrf benchmark from spec2017