The parest benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

parest when run on the k1 chip (BPI-F3) shows a 51.72% decrease in dynamic instruction counts, but a 1.94% regression in cycle counts. As has been discussed on the cam4 work item, we believe this is an artifact of weaknesses in the k1's vector unit.

Stakeholders/Partners

RISE:

...

Page Properties

Development

Status


colour	BlueGreen
title	IN PROGRESSCOMPLETE

Development Timeline

1H2024

Upstreaming

Status


colour	BlueGreen
title	IN PROGRESSCOMPLETE

Upstream Version

gcc-14

Spring 2024

Contacts

Robin Dapp (Ventana)

Jeff Law (Ventana)

Dependencies

None

Updates

28 May 2024

Note data from run on the k1 design (BPI-F3 board). The 50+% decrease in instruction counts is in the right ballpark.

14 Mar 2024

Currently seeing a 24% decrease in instruction counts when vector is enabled
- Both x86_64 and aarch64 are showing a 35% cycle improvement
- Clearly there is still work to do. It is highly unlikely we're going to hit a 35% performance improvement if we're only seeing a 24% improvement in dynamic instruction vocuntscounts

29 Dec 2023

Project reported as a priority for 1H2024

...

Version	Old Version 2	New Version Current
Changes made by	Jeff Law	Jeff Law
Saved on	Mar 14, 2024	May 29, 2024

Versions Compared

Key

Stakeholders/Partners

RISE:

Updates

Content Comparison

Versions Compared

Key

Stakeholders/Partners

RISE:

Updates