About

The cam4 benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 17%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

Stakeholders/Partners

RISE:

Ventana: Robin Dapp – lead developer

Rivos:

External:

Rivai: Juzhe

Dependencies

Status

Development	COMPLETED
Development Timeline	NA
Upstreaming	COMPLETED
Upstream Version
Contacts	Jeff Law (Ventana)
Dependencies	Closure needs Performance testing

Updates

14 Mar 2024

We are currently seeing an 18% reduction in dynamic instruction counts for GCC using vector operations which is roughly in line with expectations.
- x86 gets an approximate performance improvement of 12% from vectorization
- aarch64 gets an approximate performance improvement of 6% vectorization
- The 18% reduction for risc-v doesn't necessarily mean a 18% performance improvement, but in general we should be seeing instruction count improvements at or larger than the performance improvements seen on the competitive architectures
- Conclusion: We're in the ballpark. Next steps are to confirm on real vector hardware, keeping in mind that uarch issues may come into play

29 Dec 2023

Project reported as a priority for 1H2024

CT_00_015 -- Vectorize CAM4 benchmark in spec2017