CT_00_016 -- Vectorize wrf benchmark from spec2017
About
The WRF benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 40%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
Stakeholders/Partners
RISE:
Ventana: Robin Dapp – lead developer
Ventana: Jeff Law
External:
Rivai: Juzhe
Dependencies
Status
Development | COMPLETE |
|
|---|---|---|
Development Timeline | 1H2024 |
|
Upstreaming | COMPLETED |
|
Upstream Version | gcc-14 Spring 2024
|
|
Contacts | Robin Dapp (Ventana) Jeff Law (Ventana) |
|
Dependencies | Closure needs performance testing |
|
Updates
May 28, 2024
This benchmark does a unaligned vector access (less than element alignment) which faults on the k1. Work is underway to be less aggressive with allowing unaligned vector memory accesses and after that work lands we will retest wrf to compare with and without vector. Expectations on the k1 are that we will likely see a significant reduction in dynamic instructions, but that performance (as measured by cycles) may well show a regression due to the design of the k1 vector unit.
Mar 14, 2024
Currently seeing a 46% reduction in dynamic instructions
Actual improvement from vectorization seen on x86_64 – 37%
Actual improvement from vectorization seen on aarch64 – 37%
Again, we're counting dynamic instructions on RISC-V and actual improvement on the competitive architectures
Need to have a dynamic instruction count improvements at or better than the real improvement seen on the competitive architectures
Conclusion: Hitting the mark for this phase. Next step is to verify performance on real hardware
Dec 29, 2023
Project reported as a priority for 1H2024