Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Page Properties


Development

Status
colourGreen
titleCOMPELTE


Development Timeline1H2024
Upstreaming

Status
colourGreen
titleCOMPLETE


Upstream Version

gcc-14

Spring 2024




Contacts

Jeff Law (Ventana)


Dependencies

Closure needs

performance testing





UpdatesUpdates

 

  • Testing on the k230 board shows "only" a 17% runtime improvement when the target for x264 vectorization is a 50% runtime improvement (which will double the spec score)
  • However, it looks like the cost of a vector ALU op is at least 3X LMUL
  • So a performant uarch where vector ALU ops of reasonable size (128 bits) are 1c would see the expected 50% runtime improvement.
  • Considering this resolved.

 

  • Dynamic instruction rates cut by 47%, so in the right ballpark for a 2X performance improvement
    • x86 shows a roughly 88% improvement (ie, runtime nearly cut in half)
    • aarch64 shows roughly a 104% improvement (ie run time cut by more than 50%)
    • 47% reduction in dynamic cycle counts is in the right ballpark
    • Need to do performance testing to reach closure

...