The roms benchmark in spec2017 is reasonably friendly for vectorization, performance gains relative to a single FPU scalar implementation should be on the order of 35%. Verify the benchmark vectorizes and sees a comparable performance improvement on RISC-V.
Data from the k1 shows a 56% reduction in instruction counts, but a 5% regression in cycle counts. While disappointing it is roughly in line with other benchmarks performance with vector enabled. Again, it's believed this is weakness in the k1 vector architecture, not a failing in GCC.
Ventana: Robin Dapp – lead developer
Ventana: Jeff Law
|