Page Comparison

...

~~There is an arithmetic right shift followed by a masking operation in quant_4x4 that can be simplified into a logical right shift eliminating a small amount of code on a critical path.~~
Vector setup and element extraction is likely sub-optimal in the SAD/SATD routines.
Removal of VIEW_CONVERT_EXPR nodes is likely important

And optimization of BITFIELD_INSERT_EXPR

SATD routine, particularly for zvl512b

Rearrangement of SLP nodes with multiple occurrences in the in the same statement to avoid duplicates with a vec_perm to restore the original ordering may have as much as a 10% benefit for vectorized x264.
1. ~~Additional information from GCC's bug database~~
2. ~~Proposed patch, probably won't go in as-is, but can be used for experimentation~~
GCC does not make good use of widening vector ops that overlap source/destination registers. Expectation is this is another 1-2% improvement
GCC does not hoist vxrm assignments aggressively, which can significantly impact performance if the uarch does not provide fast vxrm access. This is about 2% on the BPI

...

Versions Compared