CT_00_020 -- Vectorize roms benchmark from spec2017

CT_00_020 -- Vectorize roms benchmark from spec2017

About

The roms benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%.   Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

 

Data from the k1 shows a 56% reduction in instruction counts, but a 5% regression in cycle counts.  While disappointing it is roughly in line with other benchmarks performance with vector enabled.   Again, it's believed this is weakness in the k1 vector architecture, not a failing in GCC.

 

 

 

Stakeholders/Partners

RISE:

Ventana: Robin Dapp – lead developer

Ventana: Jeff Law

 

External:

 

 

Dependencies

 

Status

Development

COMPLETE

 

Development Timeline

1H2024

 

Upstreaming

COMPLETE

 

Upstream Version

gcc-14

Spring 2024

 

 

Contacts

Robin Dapp (Ventana)

Jeff Law (Ventana)

 

Dependencies

Need performance

for closure

 

 

Updates

May 29, 2024 

  • Added data from a spec run on the k1 design.

Mar 14, 2024

  • Seeing a 33% reduction in dynamic instruction count

    • x86_64 has a 22% performance improvement

    • aarch64 has an 8% performance improvement

    • RVV data looks good so far, needs to be tested for actual improvement on hardware 

Dec 29, 2023 

  • Project reported as a priority for 1H2024