CT_00_018 -- Evaluate and potentially improve x264 vectorization

CT_00_018 -- Evaluate and potentially improve x264 vectorization

About

x264 should see roughly a 2X performance improvement from autovectorization based on data from other architectures.    Based on extrapolation of data from the k230 board it is expected that a uarch which can do 128bit vector ALU ops in a single cycle will see a runtime reduction of 50% for this benchmark.  That translates into a 2X improvement in the spec2017 score for 525.x264_r.

Given that's precisely the goal we were shooting for, this is considered done.

Note that Robin is investigating improving the generated code for the SATD routines.  Esesntially we're doing a lot of byte loads when we should be loading larger values.  This may provide a another small improvement on top of the basic vectorization.

 

 

Stakeholders/Partners

RISE:

Ventana: Robin Dapp

Ventana: Jeff Law

 

External:

 

 

Dependencies

 

Status

Development

COMPELTE

 

Development Timeline

1H2024

 

Upstreaming

COMPLETE

 

Upstream Version

gcc-14

Spring 2024

 

 

 

Contacts

Jeff Law (Ventana)

 

Dependencies

 

 

 

Updates

May 9, 2024 

  • Note additional opportunities for improvement.

Apr 25, 2024 

  • Testing on the k230 board shows "only" a 17% runtime improvement when the target for x264 vectorization is a 50% runtime improvement (which will double the spec score)

  • However, it looks like the cost of a vector ALU op is at least 3X LMUL

  • So a performant uarch where vector ALU ops of reasonable size (128 bits) are 1c would see the expected 50% runtime improvement.

  • Considering this resolved.

Mar 14, 2024 

  • Dynamic instruction rates cut by 47%, so in the right ballpark for a 2X performance improvement

    • x86 shows a roughly 88% improvement (ie, runtime nearly cut in half)

    • aarch64 shows roughly a 104% improvement (ie run time cut by more than 50%)

    • 47% reduction in dynamic cycle counts is in the right ballpark

    • Need to do performance testing to reach closure

Dec 29, 2023 

  • Project reported as a priority for 1H2024