Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Given that's precisely the goal we were shooting for, this is considered done.

Note that Robin is investigating improving the generated code for the SATD routines.  Esesntially we're doing a lot of byte loads when we should be loading larger values.  This may provide a another small improvement on top of the basic vectorization.



Stakeholders/Partners

RISE:

...

Page Properties


Development

Status
colourGreen
titleCOMPELTE


Development Timeline1H2024
Upstreaming

Status
colourGreen
titleCOMPLETE


Upstream Version

gcc-14

Spring 2024




Contacts

Jeff Law (Ventana)


Dependencies





Updates

 

  • Note additional opportunities for improvement.

 

  • Testing on the k230 board shows "only" a 17% runtime improvement when the target for x264 vectorization is a 50% runtime improvement (which will double the spec score)
  • However, it looks like the cost of a vector ALU op is at least 3X LMUL
  • So a performant uarch where vector ALU ops of reasonable size (128 bits) are 1c would see the expected 50% runtime improvement.
  • Considering this resolved.

...