About
Enablement of auto-vectorization in GCC for RISC-V, targeting the V extension version 1.0. The initial focus is to implement the RISC V target specific code to wire up the existing intrinsics to the basic vectorizer primitives. This enables basic vectorization of both integer and floating point codes, various reductions, etc. While the long term goal is to focus on vector length agnostic (VLA) approaches to vectorization, much of GCC's vectorizer was built assuming static vector lengths and only started supporting VLA styles recently. Thus we expect to find cases that are not well handled using VLA approaches and we expect to support VLS approaches to vectorization as stop-gap alternatives.
Basic functionality is complete and we have moved focus performance evaluation/improvement and identifying gaps.
- Given the generic vectorizer implementation and V extension capabilities, are we vectorizing the loops we should?
- Are we avoiding vectorizing loops we should not due to profitability concerns?
- Are we exploiting vector length agnostic approaches or falling back to static vector lenghts?
- When we vectorize, do we do so efficiently?
Vector implementations of key library routines such as mem*, str*, vector math (sqrt, sin, cos, etc) are primarily an issue for core libraries such as glibc. But coordination is needed between glibc and GCC to utilize libmvec (for example).
Stakeholders/Partners
RISE:
Ventana: Robin Dapp (full time) + Jeff Law for oversight/review
Rivos: Palmer Dabbelt for oversight/review
Joern Renneke (Embecosm contractor for Rivos): Some initial vector math library implementations, compiler expansion of memcpy
SiFive: Kito Cheng for oversight/Review role
External:
RiVAI: Much of the initial work was done by Juzhe. This has included the basic design/implementation, ABI work, etc. Juzhe continues to play a major role in design/implementation going forward.
SuSE/ARM/Linaro: Some of the work in this space has touched on generic parts of GCC. Various engineers from Suse, IBM, ARM & Linaro have been involved on an as-needed basis. Richard Sandiford, Richard Biener and others.
Dependencies
The most pressing upstream dependencies are:
- PSABI specification for vector argument passing and return values
- Kernel support to enable discovery of the V extension – starting to see initial downstream uses of these capabilities
- glibc support for libmvec to enable vector API for key math library functions such as sin, cos, sqrt, etc
Status
Updates
- Functionally complete.
- Interfaces between generic vectorizer code and RISC-V target are implemented
- Does not mean all the work is complete, focus has moved to optimization:
- Given the generic vectorizor's capabilities and RISC-V V ISA, do we vectorize all the loops we should
- Do we avoid vectorizing when it is not profitable
- Do we vectorize using VLA or are we falling back to VLS
- When we vectorize, do we do so efficiently
- Additional conditional vector operations via masking landing
- Optimized rounding mode switching progressing well for vector code which wants control over rounding modes
- Generic scheduling model submitted, not yet approved
- Support for "load/store lanes" with length and mask support integrated
- More rounding mode intrinsics API support landing
- Vectorized cpymem approved, will integrate once some testsuite infrastructure issues are resolve
- Remaining chunks of work:
- VEC_EXTRACT/EXTRACT_LAST, FOLD_EXTRACT_LAST
- fmac with length control
- Strided memory access
- Scheduler models
- libmvec
- Vectorization of loops with control flow via masking
- More VLS bits falling into place
- Rounding mode intrinsics API and RVV floating point dynamic rounding support
- VLS for static vector length fallback path when VLA vectorization fails or when loop iterations are known
- Averaging synthesis
- General agreement on annotation of functions with vector ABI
- In and out of order FP reductions
- Refactoring done – shaves maybe 10% off the bootstrap times
- Generic work on vectorizer significantly helped key loop in imagemagik – 11%-17% for Altra and Zen3 respectively
- Ju-Zhe and Robin appointed as reviewers for RISC-V port
- Recognize their contributions to date
- Speed up cycle time for patch review & integration
- Reimplementation of one low level concept (not user visible)
- Less confusion for developers
- Easier to extend for certain cases
- Hoping it will help scaling issues we've recently seen with builds (untested)
- Seeing some movement on functions that should likely land in libmvec
- Scatter/gather support, cond_len_* landing
- Strided loads/stores temporarily deferred to take a different approach
- Narrowing and widening vector operations in place, int↔fp conversions
- LTO issues are supposed to be fixed now
- Generic improvements for VLA scatter/gather with masking
- float16 tuple types
- Coordination branch not updated yet due to US holidays (perhaps 7/6 or 7/7)
- Expecting to have automated testing of the coordination branch in place this week
- Integer and FP ternary (multiply accumulate) are approved and partially integrated
- Optimization of widening ternary operations in progress/under review upstream
- Reductions under development
- Basic FP (unary/binary) supported on trunk and coordination branch
- Ternary (fmac) in progress, but not yet integrated.
– Note dates on or before June 15 are only approximate
- Basic integer, data movement, select, insert and extract supported on trunk and coordination branch
- Project reported as priority for 2H23
- Coordination branch created in upstream GCC repository vendor namespace. riscv/gcc-13-with-riscv-opts