About

x264 is a critical benchmark for vectorization in the spec suite showing a roughly 2X improvement across many archtiectures once vector is enabled. This work item is mean to track further improvements that may be possible in the benchmark through compiler improvements.

~~There is an arithmetic right shift followed by a masking operation in quant_4x4 that can be simplified into a logical right shift eliminating a small amount of code on a critical path.~~
Vector setup and element extraction is likely sub-optimal in the SAD/SATD routines.
1. Removal of VIEW_CONVERT_EXPR nodes is likely important
2. And optimization of BITFIELD_INSERT_EXPR
Rearrangement of SLP nodes with multiple occurrences in the in the same statement to avoid duplicates with a vec_perm to restore the original ordering may have as much as a 10% benefit for vectorized x264.
1. Additional information from GCC's bug database
2. Proposed patch, probably won't go in as-is, but can be used for experimentation

Stakeholders/Partners

RISE:

Ventana: Robin Dapp

Ventana: Jeff Law

External:

VRULL: Manolis Tsamis

Dependencies

Status

Development	IN PROGRESS
Development Timeline	2H2024
Upstreaming	IN PROGRESS
Upstream Version	gcc-15 Spring 2025
Contacts	Jeff Law (Ventana)
Dependencies

Updates

05 Jun 2024

Project reported as a priority for 2H2024, broken out from original effort

CT_00_035 -- Improve x264 vectorization