CT_00_021 -- Vectorize parest benchmark from spec2017

CT_00_021 -- Vectorize parest benchmark from spec2017

About

The parest benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%.   Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.

 

parest when run on the k1 chip (BPI-F3) shows a 51.72% decrease in dynamic instruction counts, but a 1.94% regression in cycle counts.  As has been discussed on the cam4 work item, we believe this is an artifact of weaknesses in the k1's vector unit.

 

 

 

Stakeholders/Partners

RISE:

Ventana: Robin Dapp – lead developer

Ventana: Jeff Law

 

External:

 

 

Dependencies

 

Status

Development

COMPLETE

 

Development Timeline

1H2024

 

Upstreaming

COMPLETE

 

Upstream Version

gcc-14

Spring 2024

 

 

 

Contacts

Robin Dapp (Ventana)

Jeff Law (Ventana)

 

Dependencies

None

 

 

Updates

May 28, 2024 

  • Note data from run on the k1 design (BPI-F3 board).  The 50+% decrease in instruction counts is in the right ballpark.

Mar 14, 2024 

  • Currently seeing a 24% decrease in instruction counts when vector is enabled

    • Both x86_64 and aarch64 are showing a 35% cycle improvement

    • Clearly there is still work to do.  It is highly unlikely we're going to hit a 35% performance improvement if we're only seeing a 24% improvement in dynamic instruction counts

Dec 29, 2023 

  • Project reported as a priority for 1H2024