Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. There is an arithmetic right shift followed by a masking operation in quant_4x4 that can be simplified into a logical right shift eliminating a small amount of code on a critical path.
  2. Vector setup and element extraction is likely sub-optimal in the SAD/SATD routines. 
    1. Removal of VIEW_CONVERT_EXPR nodes is likely important
    2. And optimization of BITFIELD_INSERT_EXPR
  3. Rearrangement of SLP nodes with multiple occurrences in the in the same statement to avoid duplicates with a vec_perm to restore the original ordering may have as much as a 10% benefit for vectorized x264.
    1. Additional information from GCC's bug database
    2. Proposed patch, probably won't go in as-is, but can be used for experimentation
  4. GCC does not make good use of widening vector ops that overlap source/destination registers.  Expectation is this is another 1-2% improvement
  5. GCC does not hoist vxrm assignments aggressively, which can significantly impact performance if the uarch does not provide fast vxrm access.   This is about 2% on the BPI


Stakeholders/Partners

RISE:

Ventana: Robin Dapp

Ventana: Jeff Law – currently looking at vxrm hoisting


External:

                     VRULL:  Manolis Tsamis

...

Page Properties


Development

Status
colourBlue
titleIN PROGRESS


Development Timeline2H2024
Upstreaming

Status
colourBlue
titleIN PROGRESS


Upstream Version

gcc-15

Spring 2025




Contacts

Jeff Law (Ventana)


Dependencies





Updates

  • VRULL's patch has been upstreamed and we're seeing desired vectorization for the other loop in the SATD routines
  • Note overlapping with widening ops and problems with vxrm hoisting improvement opportunities

 

  • Project reported as a priority for 2H2024, broken out from original effort

...