...
The SAD routines are somewhat notorious for having low trip counts on their loops. As a result poor vector setup can significantly reduce the benefits from autovectorization. Using masked loads and/or strided loads can help widen the vectorization factor. and improve performance. Improvements to tree-ssa-forwprop.cc can eliminate the various VIEW_CONVERT_EXPR statements, collapse permutations, simplify bit insertion/extraction, etc. The goal being to hand off nearly optimal code to the RTL phase of the compiler.
It is believed that some work on finding a way to encourage unrolling an outer loop to enable wider vectorization of an inner loop would help the SATD routines. Neither GCC nor LLVM do a good job at this.
The SATD routines may have a loop which is not currently vectorized. We need to perform variable expansion before vectorization to have any chance of vectorizing the first part of the SATD routines.
...
Stakeholders/Partners
RISE:
Ventana: Robin Dapp
Ventana: Jeff Law
External:
Dependencies
...
Page Properties | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Updates
- Project reported as a priority for 1H2024
...