About

Enablement of auto-vectorization in GCC for RISC-V, targeting the V extension version 1.0. The initial focus is to implement the RISC V target specific code to wire up the existing intrinsics to the basic vectorizer primitives. This enables basic vectorization of both integer and floating point codes, various reductions, etc. While the long term goal is to focus on vector length agnostic (VLA) approaches to vectorization, much of GCC's vectorizer was built assuming static vector lengths and only started supporting VLA styles recently. Thus we expect to find cases that are not well handled using VLA approaches and we expect to support VLS approaches to vectorization as stop-gap alternatives.

Basic functionality is complete and we have moved focus performance evaluation/improvement and identifying gaps.

Given the generic vectorizer implementation and V extension capabilities, are we vectorizing the loops we should?
Are we avoiding vectorizing loops we should not due to profitability concerns?
Are we exploiting vector length agnostic approaches or falling back to static vector lenghts?
When we vectorize, do we do so efficiently?

Vector implementations of key library routines such as mem*, str*, vector math (sqrt, sin, cos, etc) are primarily an issue for core libraries such as glibc. But coordination is needed between glibc and GCC to utilize libmvec (for example).

Stakeholders/Partners

RISE:

Ventana: Robin Dapp (full time) + Jeff Law for oversight/review

Rivos: Palmer Dabbelt for oversight/review

Joern Renneke (Embecosm contractor for Rivos): Some initial vector math library implementations, compiler expansion of memcpy

SiFive: Kito Cheng for oversight/Review role

External:

RiVAI: Much of the initial work was done by Juzhe. This has included the basic design/implementation, ABI work, etc. Juzhe continues to play a major role in design/implementation going forward.

SuSE/ARM/Linaro: Some of the work in this space has touched on generic parts of GCC. Various engineers from Suse, IBM, ARM & Linaro have been involved on an as-needed basis. Richard Sandiford, Richard Biener and others.

Dependencies

The most pressing upstream dependencies are:

PSABI specification for vector argument passing and return values
Kernel support to enable discovery of the V extension – starting to see initial downstream uses of these capabilities
glibc support for libmvec to enable vector API for key math library functions such as sin, cos, sqrt, etc

Status

Development	COMPLETE
Development Timeline	Basic functionality 2H2023 Optimization, 2H2023 - 2H2024
Upstreaming
Upstream Version	Development Trunk gcc-13 RISC-V coordination branch	Will turn into gcc-14, spring 2024 Available to all to use, but not official release from GCC project
Contacts	Robin Dapp (Ventana) Kito Cheng (SiFive) Palmer Dabbelt (Rivos) Jeff Law (Ventana)
Dependencies	PSABI for vector Kernel discovery glibc for libmvec

Updates

30 Aug 2023

Functionally complete.
- Interfaces between generic vectorizer code and RISC-V target are implemented
- Does not mean all the work is complete, focus has moved to optimization:
  - Given the generic vectorizor's capabilities and RISC-V V ISA, do we vectorize all the loops we should
  - Do we avoid vectorizing when it is not profitable
  - Do we vectorize using VLA or are we falling back to VLS
  - When we vectorize, do we do so efficiently

23 Aug 2023

Additional conditional vector operations via masking landing
Optimized rounding mode switching progressing well for vector code which wants control over rounding modes
Generic scheduling model submitted, not yet approved

16 Aug 2023

Support for "load/store lanes" with length and mask support integrated
More rounding mode intrinsics API support landing
Vectorized cpymem approved, will integrate once some testsuite infrastructure issues are resolve
Remaining chunks of work:
- VEC_EXTRACT/EXTRACT_LAST, FOLD_EXTRACT_LAST
- fmac with length control
- Strided memory access
- Scheduler models
- libmvec

09 Aug 2023

Vectorization of loops with control flow via masking
More VLS bits falling into place

02 Aug 2023

Rounding mode intrinsics API and RVV floating point dynamic rounding support
VLS for static vector length fallback path when VLA vectorization fails or when loop iterations are known
Averaging synthesis
General agreement on annotation of functions with vector ABI

26 Jul 2023

In and out of order FP reductions
Refactoring done – shaves maybe 10% off the bootstrap times
Generic work on vectorizer significantly helped key loop in imagemagik – 11%-17% for Altra and Zen3 respectively

19 Jul 2023

Ju-Zhe and Robin appointed as reviewers for RISC-V port
- Recognize their contributions to date
- Speed up cycle time for patch review & integration
Reimplementation of one low level concept (not user visible)
- Less confusion for developers
- Easier to extend for certain cases
- Hoping it will help scaling issues we've recently seen with builds (untested)
Seeing some movement on functions that should likely land in libmvec

13 Jul 2023

Scatter/gather support, cond_len_* landing
Strided loads/stores temporarily deferred to take a different approach

05 Jul 2023

Narrowing and widening vector operations in place, int↔fp conversions
LTO issues are supposed to be fixed now
Generic improvements for VLA scatter/gather with masking
float16 tuple types
Coordination branch not updated yet due to US holidays (perhaps 7/6 or 7/7)
Expecting to have automated testing of the coordination branch in place this week

28 Jun 2023

Integer and FP ternary (multiply accumulate) are approved and partially integrated
Optimization of widening ternary operations in progress/under review upstream
Reductions under development

21 Jun 2023

Basic FP (unary/binary) supported on trunk and coordination branch
Ternary (fmac) in progress, but not yet integrated.

15 Jun 2023 – Note dates on or before June 15 are only approximate

Basic integer, data movement, select, insert and extract supported on trunk and coordination branch

01 Jun 2023

Project reported as priority for 2H23
Coordination branch created in upstream GCC repository vendor namespace. riscv/gcc-13-with-riscv-opts

PLACEHODLER as we split this project into two...Copy of CT_00_001 - Autovectorization -- Basic Functionality (GCC)