Given that's precisely the goal we were shooting for, this is considered done.

Note that Robin is investigating improving the generated code for the SATD routines. Esesntially we're doing a lot of byte loads when we should be loading larger values. This may provide a another small improvement on top of the basic vectorization.

Stakeholders/Partners

RISE:

...

Page Properties

Development

Status


colour	Green
title	COMPELTE

Development Timeline

1H2024

Upstreaming

Status


colour	Green
title	COMPLETE

Upstream Version

gcc-14

Spring 2024

Contacts

Jeff Law (Ventana)

Dependencies

Updates

09 May 2024

Note additional opportunities for improvement.

25 Apr 2024

Testing on the k230 board shows "only" a 17% runtime improvement when the target for x264 vectorization is a 50% runtime improvement (which will double the spec score)
However, it looks like the cost of a vector ALU op is at least 3X LMUL
So a performant uarch where vector ALU ops of reasonable size (128 bits) are 1c would see the expected 50% runtime improvement.
Considering this resolved.

...

Versions Compared

Old Version 5

New Version Current

Key

Stakeholders/Partners

RISE:

Updates

Page Comparison

Versions Compared

Old Version 5

New Version Current

Key

Stakeholders/Partners

RISE:

Updates