The roms benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

Data from the k1 shows a 56% reduction in instruction counts, but a 5% regression in cycle counts. While disappointing it is roughly in line with other benchmarks performance with vector enabled. Again, it's believed this is weakness in the k1 vector architecture, not a failing in GCC.

Stakeholders/Partners

RISE:

...

Page Properties

Development

Status


colour	Green
title	COMPLETE

Development Timeline

1H2024

Upstreaming

Status


colour	Green
title	COMPLETE

Upstream Version

gcc-14

Spring 2024

Contacts

Robin Dapp (Ventana)

Jeff Law (Ventana)

Dependencies

Need performance

for closure

Updates

29 May 2024

Added data from a spec run on the k1 design.

14 Mar 2024

Seeing a 33% reduction in dynamic instruction count
- x86_64 has a 22% performance improvement
- aarch64 has an 8% performance improvement
- RVV data looks good so far, needs to be tested for actual improvement on hardware

...

Version	Old Version 3	New Version Current
Changes made by	Jeff Law	Jeff Law
Saved on	Mar 14, 2024	May 29, 2024

Versions Compared

Key

Stakeholders/Partners

RISE:

Updates

Content Comparison

Versions Compared

Key

Stakeholders/Partners

RISE:

Updates