Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

About

x264 should see roughly a 2X performance improvement from autovectorization based on data from other architectures.   We need to verify we see similar improvements on RISC-V and if not address any shortcomings in the code generation.


The SAD routines are somewhat notorious for having low trip counts on their loops.  As a result poor vector setup can significantly reduce the benefits from autovectorization.  Using masked loads and/or strided loads can help widen the vectorization factor. and improve performance.  Improvements to tree-ssa-forwprop.cc can eliminate the various VIEW_CONVERT_EXPR statements, collapse permutations, simplify bit insertion/extraction, etc.  The goal being to hand off nearly optimal code to the RTL phase of the compiler. 


The SATD routines may have a loop which is not currently vectorized.  We need to perform variable expansion before vectorization to have any chance of vectorizing the first part of the SATD routines.


get_ref, sub_dct and other routines do provide some vector opportunities as well and need to be investigated.




Stakeholders/Partners

RISE:

Ventana: Jeff Law


External:



Dependencies


Status

Development

NOT STARTED


Development TimelineNA
Upstreaming

NOT STARTED


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

None



Updates

 

  • Project reported as a priority for 1H2024


  • No labels