CT_00_003 -- Redundant Extension Elimination (GCC)
About
The ABI for RISC-V specifies that 32bit values on rv64 are held as sign extended values in registers. This simplifies calling conventions and makes comparisons 64-bit comparison instructions work as-expected when operating on 32-bit values. However, this ABI convention can result in extraneous extensions which in turn impacts code performance and density.
By fully expressing the sign extending nature of the "w" variant instructions such as add.w, GCC can use the implicit sign extension from the 32 bit "w" instruction to eliminate an subsequent explicit sign extension. Analyzing instruction count data for cpu2017 shows that up to 2% of the dynamic instructions can be eliminated in the xz portion of the benchmark and about .5% for the gcc portion of that benchmark. This phase of redundant extension elimination is complete, it is believed further improvements can be made and those further improvements may (over time) show up as RISE initiatives.
It appears that a change from circa 1994 is resulting in a significant number of redundant sign extensions for incoming function arguments due to losing the state of argument extensions. Vineet's analysis shows fixing this issue can reduce the dynamic instruction counts by as much as .7% on some spec workloads (omnetpp) and about .25% on others (gcc, x264). After much analysis and testing Vineet's patch has been integrated.
Vineet has discovered another case where we are extending values that we know are already extended. Essentially the compiler is failing to look at the extension status on some paths. So far the improvements for this have been very small. While further improvements are expected in this code, those improvements are not expected to show significant benefits.
Vineet is currently looking at cases where REE (redundant extension elimination) finds an extension, but fails to eliminate it. This effort is what lead to finding the issue from the change back in 1994. The general idea is to identify if there are any systemic failures in REE. Vinneet is also digging into patches from Ajit (IBM) to see if they can be cleaned up and integrated. That work from Ajit is mean to exploit ABI guarantees as well as implicit zero extensions to eliminate unnecessary extensions. The "insert dummy extension" for REE and CSE/PRE falls into this general space as well.
Jivan (RAU, contractor for Ventana) is digging into Joern's (Embecosm, contractor for Rivos) code for extending DSE to track sub-word objects and use DCE approaches to eliminate extensions that set bits that are never used. That code shows roughly a 1% improvement for x264, but currently fails correctness tests on various cases. The x264 improvement looks real, so we'll continue to analyze the failures in the hope this can be wrapped up in the next couple weeks before gcc-14's feature freeze deadline. If it doesn't make it, we'll push it into CY14/gcc-15.
The variable bit manipulation problem is likely going to be deferred to 2024. To get the semantics correct, we must insert a sign extension after the bit manipulation which makes the transformation in and of itself performance neutral. The hope is that as we improve the ability to identify unnecessary extensions that the inserted sign extensions can be often eliminated resulting in a performance improvement.
Stakeholders/Partners
RISE:
Ventana: 1FTE for a few weeks. Jivan Hakobyan at the CAST (Center of Advanced Software Technologies – RAU). Jeff Law/oversight
Embecosm: Joern Rennecke (contractor) has a pass which appears to target #4 – needs evaluation
Rivos/ Vineet Gupta looking to explore profitability of some of the other options.
External:
Dependencies
None
Status
Updates
- Remaining work on ext-dce has moved into a new task for 1H2024 and will be tracked there.
- Vineet's patch to remove old 1994 bugfix has been integrated - it elides clearing the subreg promoted note early in expansion preventing Expand from subsequently generating a superfluous sign extend.
- Vineet's patch to avoid some redundant extensions in compare and jump expansion codepath has been approved along with follow-ups that will improve it slightly
- Jivan has confirmed the benefit seen from Joern's ext-dce patch on x264 is real. We just need to fix the various bugs in Joern's code
- Analysis continues using hacks to REE to identify more cases where we may have redundant extensions.
- Nothing so far indicates we need that old 1994 code. Plan is to remove it this week
- Even with that issue resolved, Vineet still sees significant complaints from REE about being unable to find definitions which he's chased back to primarily to other gimple→RTL issues that we'll need to tackle individually
- Jivan has confirmed that we can find more cases with dummy insn insertion in REE and/or combine. These may be picking up the same things as Vineet's work. Unsure right now
- Jeff has provided Jivan a good testcase for Joern's DCE related work in this space for evaluation, it seems to be working on a test that we had identified as particularly well suited for this approach. We still need to evaluate effectiveness on a wider scale.
- Updates on various subprojects in this space. Mostly to help coordination across multiple organizations/engineers.
- Joern's bit tracking DCE patch looks viable. Needs lots of testing, but overall structure is reasonable
- Some evidence we might not be capturing redundant extensions in CSE/GCSE, which definitely needs investigation
- Hack for REE isn't working yet, but hopefully will be soon so we can evaluate profitability
- Vineet is poking at Joern's old code as well as Ajit's work (IBM)
- My (Jeff) sense is that Joern's approach is better in general, though some aspects of Ajit's work might be useful
- Joern's code is an extension to DCE (dead code elimination).
- His work tracks "liveness" of chunks of a word.
- If (for example) the high 32 bits of a 64bit object are not live and there is an explicit extension from 32→64 bits that extension sets bits that are never used and the extension is thus dead code and can be removed
- Nice solution to the problem. Definitely seems like worth a deeper investigation
- Jivan has done some analysis of 32bit opcodes followed by zero/sign extension
- deepsjeng and leela seem like best places to evaluate
- Probably on the order of .25% of the dynamic instruction stream, so small, but large enough to be worth deeper investigation
- Joern has posted his patch to improve sign extension removal. He's noted it was a work in progress and is not complete. This work needs evaluation.
- Jivan is on PTO. His work on exploiting variable bit position bset, bclr and binv is temporarily on hold until he returns
- Initial design for how to utilize bset, bclr, binv with variable bit positions done
- Relies on proving they will not set the 32bit sign bit
- Unclear if it'll be all that helpful in practice
- Clearly next step is to evaluate and decide if this design should go forward or not
- Queuing up some exploratory work
- Note stakeholders/partners in a consistent way
– Dates on or before June 15 are approximate
- Work upstreamed to GCC development trunk
- Work backported to gcc-13 with RISC-V optimization coordination branch
- This phase considered complete
- Project reported as priority for 2H23