CT_00_021 -- Vectorize parest benchmark from spec2017
About
The parest benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
parest when run on the k1 chip (BPI-F3) shows a 51.72% decrease in dynamic instruction counts, but a 1.94% regression in cycle counts. As has been discussed on the cam4 work item, we believe this is an artifact of weaknesses in the k1's vector unit.
Stakeholders/Partners
RISE:
Ventana: Robin Dapp – lead developer
Ventana: Jeff Law
External:
Dependencies
Status
Development | COMPLETE |
|
|---|---|---|
Development Timeline | 1H2024 |
|
Upstreaming | COMPLETE |
|
Upstream Version | gcc-14 Spring 2024
|
|
Contacts | Robin Dapp (Ventana) Jeff Law (Ventana) |
|
Dependencies | None |
|
Updates
May 28, 2024
Note data from run on the k1 design (BPI-F3 board). The 50+% decrease in instruction counts is in the right ballpark.
Mar 14, 2024
Currently seeing a 24% decrease in instruction counts when vector is enabled
Both x86_64 and aarch64 are showing a 35% cycle improvement
Clearly there is still work to do. It is highly unlikely we're going to hit a 35% performance improvement if we're only seeing a 24% improvement in dynamic instruction counts
Dec 29, 2023
Project reported as a priority for 1H2024