About
Of the FP benchmarks within spec2017, lbm has the highest potential for vectorization. On other architectures improvements of greater than 2X can be seen when enabling autovectorization. The key routine to vectorize is " LBM_performStreamCollideTRT" and I don't think it's being vectorized at all at this timeThe WRF benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 40%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
Stakeholders/Partners
RISE:
Ventana: Jeff Law
External:
Dependencies
Status
Page Properties | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Updates
- Project reported as a priority for 1H2024
...