CT_00_020 -- Vectorize roms benchmark from spec2017
About
The roms benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
Data from the k1 shows a 56% reduction in instruction counts, but a 5% regression in cycle counts. While disappointing it is roughly in line with other benchmarks performance with vector enabled. Again, it's believed this is weakness in the k1 vector architecture, not a failing in GCC.
Stakeholders/Partners
RISE:
Ventana: Robin Dapp – lead developer
Ventana: Jeff Law
External:
Dependencies
Status
Development | COMPLETE |
|
|---|---|---|
Development Timeline | 1H2024 |
|
Upstreaming | COMPLETE |
|
Upstream Version | gcc-14 Spring 2024 |
|
Contacts | Robin Dapp (Ventana) Jeff Law (Ventana) |
|
Dependencies | Need performance for closure |
|
Updates
May 29, 2024
Added data from a spec run on the k1 design.
Mar 14, 2024
Seeing a 33% reduction in dynamic instruction count
x86_64 has a 22% performance improvement
aarch64 has an 8% performance improvement
RVV data looks good so far, needs to be tested for actual improvement on hardware
Dec 29, 2023
Project reported as a priority for 1H2024