Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is an open collaboration. All ideas and contributions are valuable as we work together to enhance RISC-V's video codec capabilities.

Vector transpose instructions

Intro

In x264, matrix transpose instructions are primarily used in two aspects: one is to achieve matrix transposition, and the other is to achieve permutation between vectors. Both uses are quite frequent.

Implementation in other ISAs

In other ISAs, matrix transposition is usually implemented in two ways. Below, we will introduce these methods using aarch64 and loongarch as examples. The implementation in x86 is similar to loongarch, while the implementation in ARM is similar to aarch64.

Aarch64

In aarch64, there are trn1 and trn2 instructions. By combining one trn1 and one trn2, multiple 2x2 matrix transpositions can be completed between two vector registers. Larger matrix transpositions can be achieved by repeatedly calling 2x2 matrix transpositions of different scales. The aarch64 transpose macro implementation in x264 is as follows: