/
Project RP013: Optimizing PyTorch ATen Operators for High-Performance RISC-V Hardware

Project RP013: Optimizing PyTorch ATen Operators for High-Performance RISC-V Hardware

Bidding Starts:4/9/2025

Bidding Ends: 5/8/2025

 

SUMMARY:

To stay competitive, RISC-V hardware must not only support but also deliver high-performance, out-of-the-box compatibility with essential AI tools. To advance this goal, RISE is funding the optimization of PyTorch ATen operators for RISC-V.

This RFP aims to optimize PyTorch ATen operators for RISC-V, specifically leveraging the RISC-V Vector (RVV) architecture. The goal is to enhance performance and compatibility of PyTorch on RISC-V hardware, ensuring it runs efficiently out of the box. The work involves optimizing PyTorch ATen operators for RVV, implementing vector-length agnostic (VLA) support, and improving OpenBLAS for common matrix shapes used in machine learning models. Contributions will be upstreamed to the PyTorch community, targeting the BPI-F3 board and using only ratified RISC-V extensions.

Milestones to Deliver:

  1. Optimize PyTorch ATen operators on CPU for RVV

  • Optimize PyTorch ATen operators for RVV architecture to leverage its vector processing capabilities.

    • Optimize for Vector Length Agnostic (VLA).

    • We focus on top operators taking at least 5% or more of CPU time (These are highlighted in green in the spreadsheet below)

  • Based on the work done in vec: support RVV by zhangfeiv0 · Pull Request #135570 · pytorch/pytorch , add a VLA implementation of the aten/vec library

  • Ensure compatibility and performance across various PyTorch models and workloads on RVV-enabled CPUs.

  • Contribute optimizations and bug fixes back to the PyTorch upstream community.

  • Current plan: Target the BPI-F3 board.  Only ratified extensions are to be used (RVA23 mandatory extensions, no AME/IME currently)

  • The torch.compile feature is out-of-scope of this milestone

  • Single-core vs Multi-core: initial optimization is for single-core; expectation is that it will scale sufficiently well.

  1. Optimize OpenBLAS for various matrix shapes for PyTorch ATen operators

  • Identify common matrix shapes (rectangular, tall-and-skinny, short-and-wide) used in machine learning models.

  • Develop and upstream kernel-shape selection algorithm based on input matrix shapes.

  • Develop and upstream specialized matrix multiplication algorithms and optimizations for each matrix shape.

  • (The assumption here is that OpenBLAS will be used by the PyTorch.ATen operators/kernels mentioned above)

  • Success criteria - PyTorch will, once these changes land upstream, run with the optimized kernels developed in this phase.

 

The models listed in PyTorch Aten Ops Profiling on CPU are used  to measure performance uplift. The reproduction steps are documented in https://gitlab.com/riseproject/torchperf.

List of Operators and Shapes

The specific list of PyTorch operators and shapes used for measuring performance are listed in https://gitlab.com/riseproject/torchperf/-/issues/?milestone_title=RP013%3A%20Optimizing%20PyTorch%20ATen%20Operators%20for%20High-Performance%20RISC-V%20Hardware.

To measure performance, you must use https://gitlab.com/riseproject/torchperf/-/blob/main/torchbench.py. A sample output of that script can be found at https://gitlab.com/-/snippets/4836684.

Upstreaming

It is important to RISE that patches must be submitted to the upstream project(s), and preferably merged by the upstream project(s), as a condition of completing this contract.  RISE also expects the work to be done in alignment with upstream maintainer expectations.  Therefore, after posting  patches to the relevant project mailing lists and/or posting pull requests against the upstream project, all maintainer feedback provided within 3 weeks of the posting must be addressed, and the patches updated as needed.  This time limit resets with each maintainer comment that requires changes.
However, RISE also understands that long periods of time may elapse between the point at which all maintainer feedback has been resolved, and when the patches themselves are merged.   If no resolvable maintainer feedback is posted after 3 weeks of posting the most recent revision of the patches, RISE will consider the work complete, contingent on the work being approved by RISE.

Proposals

Interested vendors should submit their proposals including:

  1. Technical approach and implementation plan.

  2. Please provide a breakdown of the total cost along with the individual costs and durations for each milestone. 

  3. Proficiency in Chinese/Mandarin is highly desirable, as the position requires regular collaboration with Chinese-speaking stakeholders.


Please read the RISE RFP instructions PRIOR to bidding.

Some things to note include:

  • Contracts will be written using the Standard Linux Foundation Europe Paper with the SOW and payment schedule added as an addendum. 

    • Please review prior to your bid submission to address any concerns.

    • Contract Language is not negotiable as Linux Foundation will be contracting the work and paying the invoices.

  • Contracts are milestone based, not hourly.

  • Biweekly progress reporting is a requirement of this contract.


Bidding Closed

 

Related content