CT_00_033 -- New instruction fusions
About
Instruction fusion is an important technique to improve performance of RISC-V systems. While GCC has support for a common set of instruction fusions, it's believed that additional cases will become more important as new designs come to market.
- Store pair fusions. Right now GCC only supports fusion of two 64 bit stores to aligned addresses. Newer designs are expected to be able to fuse pairs of same sized stores fairly aggressively.
- Address calculations with memory references. Right now GCC supports a limited set fusions of address arithmetic with a memory reference. Newer designs are expected to be able to fuse nearly every add/shadd with a subsequent memory reference.
- Zero extended bitfield extractions. Right now GCC supports a small subset of bitfield extractions when implemented via shifts. The shift counts are fairly restrictive.
- Newer designs are expected to fuse these operations very aggressively.
- The compiler should rewrite a right shift + masking off upper bits as a left shift + right logical shift to facilitate fusion.
- Left shift + add (shift count > 3) are expected to be fusible in the near future. As are shifts followed by other non-shift ALU ops, including 'w' variants.
- Others may show up over time.
Stakeholders/Partners
RISE:
Ventana: Jeff Law. Oversight/guidance
Ventana: Daniel Barboza: Implementation
External:
Samsung: Artemiy Volkov
Dependencies
Status
Updates
- Artemiy and Jeff have been discussing a proposed new approach for at least parts of the instruction fusion flow
- Handle load-store pairs independently of fusions where the result of one instruction is used as an input and output of a subsequent instruction
- load-store pairs should be handled fairly generically, possibly using the infrastructure being put in place by the AArch64 GCC team
- Dependent instruction fusion would be the focus of Artemiy's work
- Core issue. If instructions are not consecutive in the insn stream within the scheduler passes, then no fusion is even attempted
- Use rtl-ssa framework to identify cases where the result of one instruction is used once and only once in a subsequent instruction
- Feed those pairs into the pre-existing function checks
- On success bring the instructions together in the IL and mark them as fused (SCHED_GROUP_P)
- It's believed that register allocation and renaming shouldn't be a major issue once the basic IL is better
- Daniel's work on fusion is probably ready for upstreaming
- Daniel has the core work completed. Needs to be benchmarked on design, adjusted as needed, then submitted upstream.
- Investigated de-fusion, but appears that it isn't necessary
- Daniel Barboza is working part-time on implementing the new fusions along with testcases
- Some technical hurdles, but they're being worked through. Nothing that appears to be show stopping
- Project added as priority for 2H 2024.