About
The cam4 benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 17%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
Stakeholders/Partners
RISE:
Ventana: Robin Dapp – lead developer
Rivos:
External:
Rivai: Juzhe
Dependencies
Status
Updates
- We are currently seeing an 18% reduction in dynamic instruction counts for GCC using vector operations which is roughly in line with expectations.
- x86 gets an approximate performance improvement of 12% from vectorization
- aarch64 gets an approximate performance improvement of 6% vectorization
- The 18% reduction for risc-v doesn't necessarily mean a 18% performance improvement, but in general we should be seeing instruction count improvements at or larger than the performance improvements seen on the competitive architectures
- Conclusion: We're in the ballpark. Next steps are to confirm on real vector hardware, keeping in mind that uarch issues may come into play
- Project reported as a priority for 1H2024