About

The cam4 benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 17%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

Stakeholders/Partners

RISE:

Ventana: Robin Dapp – lead developer

Rivos:

External:

Rivai: Juzhe

Dependencies

Status

Development	COMPLETED
Development Timeline	1H2024
Upstreaming	COMPLETED
Upstream Version	gcc-14 Spring 2024
Contacts	Robin Dapp (Ventana) Jeff Law (Ventana)
Dependencies	Closure needs Performance testing

Updates

14 Mar 2024

We are currently seeing an 18% reduction in dynamic instruction counts for GCC using vector operations which is roughly in line with expectations.
- x86 gets an approximate performance improvement of 12% from vectorization
- aarch64 gets an approximate performance improvement of 6% vectorization
- The 18% reduction for risc-v doesn't necessarily mean a 18% performance improvement, but in general we should be seeing instruction count improvements at or larger than the performance improvements seen on the competitive architectures
- Conclusion: We're in the ballpark. Next steps are to confirm on real vector hardware, keeping in mind that uarch issues may come into play

29 Dec 2023

Project reported as a priority for 1H2024

CT_00_015 -- Vectorize CAM4 benchmark in spec2017