About
Enablement of auto-vectorization in LLVM for RISC-V, targeting the V extension version 1.0. While the long term goal is to focus on vector length agnostic (VLA) approaches to vectorization, some of LLVM's vectorizer may still be biased towards fixed vector sizes. Thus we expect to find cases that are not well handled using VLA approaches and we expect to support VLS approaches to vectorization a stop-gap alternatives.
...
LLVM's support for auto-vectorization on RISC-V appears to be improving regularly, but it is sensitive to having reasonable micro-architectural data available. Thus it may be necessary to stub-out values for these key parameters when enabling auto-vectorization on a new micro-architecture, or to disable the costing model.
Stakeholders/Partners
RISE:
Ventana: 1 FTE focused on getting necessary uarch data ready
Ventana: 1 FTE Reference/target implementation of key x264 loops, breakdown of tasks that need to be solved to achieve desired code generation
SiFive: Craig Topper, Alexey Bataev, Kolya Panchenko
Rivos:
External:
Alex Bradbury
Dependencies
The most pressing upstream dependencies are:
- PSABI specification for vector argument passing and return values
- Kernel support to enable discovery of the V extension
- glibc support for libmvec to enable vector API for key math library functions such as sin, cos, sqrt, etc (does LLVM support libmvec calls?)
Status
Page Properties | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Updates
- Evaluation of LLVM trunk indicates ongoing improvements for key routines in x264
- Notable lack of strided memory accesses will impact x264 performance
- SLP (superword level parallelism or straight line parallelism) enabled for short vectors upstream
- Ventana has lit up autovect in its internal LLVM tree after adding uarch details to that tree
- One of the simpler routines now generating reference/target code
...
- Moved to 2H2024
- Patch for using VP intrinsics for unary and binary operators https://github.com/llvm/llvm-project/pull/93854
- First VP intrinsic vectorizer patch merged https://github.com/llvm/llvm-project/pull/76172. First step to vectorizing using strip mined loops like examples in the vector spec.
- Improving vectorization split off as distinct project