About

x264 should see roughly a 2X performance improvement from autovectorization based on data from other architectures. We need to verify we see similar improvements on RISC-V and if not address any shortcomings in the code generation.

The SAD routines are somewhat notorious for having low trip counts on their loops. As a result poor vector setup can significantly reduce the benefits from autovectorization. Using masked loads and/or strided loads can help widen the vectorization factor. and improve performance. Improvements to tree-ssa-forwprop.cc can eliminate the various VIEW_CONVERT_EXPR statements, collapse permutations, simplify bit insertion/extraction, etc. The goal being to hand off nearly optimal code to the RTL phase of the compiler.

The SATD routines may have a loop which is not currently vectorized. We need to perform variable expansion before vectorization to have any chance of vectorizing the first part of the SATD routines.

get_ref, sub_dct and other routines do provide some vector opportunities as well and need to be investigated.

Stakeholders/Partners

RISE:

Ventana: Jeff Law

External:

Dependencies

Status

Development	NOT STARTED
Development Timeline	NA
Upstreaming	NOT STARTED
Upstream Version
Contacts	Jeff Law (Ventana)
Dependencies	None

Updates

29 Dec 2023

Project reported as a priority for 1H2024

CT_00_018 -- Evaluate and potentially improve x264 vectorization