Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

About

The cam4 benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 17%.   Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.


Stakeholders/Partners

RISE:

Ventana: Robin Dapp – lead developer

Rivos:


External:

                     Rivai: Juzhe




Dependencies


Status

Development

COMPLETED


Development TimelineNA
Upstreaming

COMPLETED


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

Closure needs

Performance testing



Updates

 

  • We are currently seeing an 18% reduction in dynamic instruction counts for GCC using vector operations which is roughly in line with expectations.
    • x86 gets an approximate performance improvement of 12% from vectorization
    • aarch64 gets an approximate performance improvement of 6% vectorization
    • The 18% reduction for risc-v doesn't necessarily mean a 18% performance improvement, but in general we should be seeing instruction count improvements at or larger than the performance improvements seen on the competitive architectures
    • Conclusion: We're in the ballpark.  Next steps are to confirm on real vector hardware, keeping in mind that uarch issues may come into play

 

  • Project reported as a priority for 1H2024


  • No labels