/
CT_00_050 -- Improve x264 vectorization
CT_00_050 -- Improve x264 vectorization
About
x264 is a critical benchmark for vectorization in the spec suite showing a roughly 2X improvement across many archtiectures once vector is enabled. This work item is mean to track further improvements that may be possible in the benchmark through compiler improvements.
- SAD optimization
- Shorter SADs (sad_x3_8x8) can benefit from strided loads
- SATD
- Use strided loads to avoid permutations in the first SATD loop
- Revisit profitability from deriving permutation constants from each other using vadd.vi may not be needed anymore
- Use wider vectors in the 2nd loop. Smart unrolling seems to be the key here
- vaaddu
- Designs which flush pipeline on VXRM assignment may be better off using (a + b + 1) >> 1
- Implies expander should probably be conditional on a suitable uarch flag
- Designs with good vxrm behavior could probably be using vaaddu more
- Revisit conservative vsetvl elimination
- Jeff's patch is a reasonable start
- Needs to be re-benchmarked
- Rather than swapping elements, shift them in the array to perturb the schedule less
- May want some degree of freedom, particularly if uarch doesn't handle vsetvl efficiently.
Stakeholders/Partners
RISE:
Ventana: Robin Dapp – Cost model, permutation improvements, etc. Overall lead
Ventana: Jeff Law – everything scheduling related
External:
Dependencies
Status
Updates
- Project reported as a priority for 1H2025, broken out from original effort
, multiple selections available,
Related content
CT_00_035 -- Improve x264 vectorization
CT_00_035 -- Improve x264 vectorization
More like this
CT_00_018 -- Evaluate and potentially improve x264 vectorization
CT_00_018 -- Evaluate and potentially improve x264 vectorization
More like this
CT_01_011 - Improve if-conversion, particularly for x264 quant4x4 (LLVM)
CT_01_011 - Improve if-conversion, particularly for x264 quant4x4 (LLVM)
More like this
CT_01_008 - Autovectorization -- Improvements (LLVM)
CT_01_008 - Autovectorization -- Improvements (LLVM)
More like this
CT_00_016 -- Vectorize wrf benchmark from spec2017
CT_00_016 -- Vectorize wrf benchmark from spec2017
More like this
CT_01_001 - Autovectorization -- Basic Functionality (LLVM)
CT_01_001 - Autovectorization -- Basic Functionality (LLVM)
More like this