...
The bwaves benchmark has some vector opportunities. x86_64 is seeing roughly a 10% improvement due to vectorization, but aarch64 is getting no measurable improvement. This may point to a problem with VLA style vectorization.
Things have gone a bit backwards over the last several months. We're now seeing a 5% regression in dynamic instruction counts with vector and even larger regressions in cycle counts when run on the k1 board. The most pressing need here is to figure out what's going on with the instruction counts. Odds are we're not going to see improvement on the k1 design due weakness in the vector architecture.
Stakeholders/Partners
RISE:
...
Page Properties | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Updates
- Data from k1. Moving to 2H2024.
- Seeing a roughly 5% improvement in dynamic instruction counts
- x86_64 sees a 10% runtime improvement from vectorization
- aarch64 sees no improvement from vectorization
- Suspect there's a problem with VLA style vectorization. Happy we're seeing a 5% count improvement, but not enough to get us on-par with x86
...