About

Certain address computations are poorly optimized by GCC. While the scenarios where this happens are relatively limited, they do show up within cpu2017 's mcf, deepsjeng and cactubssn) benchmarks. By re-associating the address arithmetic and combining constants we can eliminate 1-2% of the dynamic instruction counts in these benchmarks, improving both performance and code density. This re-association work is to be done after register allocation in a target independent pass and has been integrated into upstream gcc.

Additionally, certain addresses can be re-associated to facilitate loop invariant code motion, particularly address computations for objects stored on the stack. This rewriting early in the RTL optimizer pipeline can in turn also expose additional opportunities for the post register allocation pass mentioned above. This early RTL address rewriting is expected to improve leela's instruction count in cpu2017 by 0.4%. These items are complete and integrated.

Jivan has also identified problems with fp elimination which result in unnecessary address arithmetic in some cases as well as unprofitable rematerialization of constant addresses in register allocator. These issues have been addressed in the register allocator. No benchmarking has been done to evaluate impact – it's expected to be relatively small.

~~Symbolic address reassociation for deepsjeng was ultimately concluded to be non-viable and has been dropped.~~

Finally, hooks also exist to allow target dependent address rewriting during LRA/reload. This typically is used to adjust memory references with out of range constants, careful rewriting of such addresses can result in sharing the "highpart" of the address calculation. This is not expected to be a major improvement, but is closely related to work Jivan has already been going and is being included for completeness.

Stakeholders/Partners

RISE:

Ventana: 1 FTE for a few months (Manolis Tsamis with VRULL) on the post register allocation address rewriting

Ventana: 1 FTE for a couple weeks (Jivan Hakobyan with CAST at RAU) for early RTL address rewriting to improve LICM

Ventana: Jeff Law, review/oversight, debugging/testing on other architectures

Ventana: Raphael Zinsly: Exploring rewriting symbolic addresses to improve deepsjeng

External:

VRULL: Manolis Tsamis, Philipp Tomsich, primary authors, ongoing development/improvements

Dependencies

None

Status

Development	COMPLETED
Development Timeline	2H2023
Upstreaming	COMPLETED
Upstream Version	gcc-14 (Spring 2024) gcc-13 RISC-V Coordination Branch
Contacts	Jeff Law (Ventana) Manolis Tsamis (VRULL)
Dependencies	None

Updates

10 Nov 2023

Final update on items completed and dropping symbolic address reassociation.

31 Oct 2023

Manolis's patch has been integrated upstream.
Vlad's fixes to the register allocator have been installed upstream.

04 Oct 2023

Vlad's change had to be reverted. Unsure if it'll be refined and resubmitted or is going to be dropped
V6 of Manolis's (VRULL) rewriting patch posted. One minor x86 issue to resolve, but it's otherwise ready to go.

27 Sep 2023

Vlad's work seems to be making things worse. Testcase extracted and passed along to Vlad
Raphael's improvements for symbolic address rewriting still failing deepsjeng, no luck yet with finding a shorter path to debugging the failure

14 Sep 2023

Vlad has pushed a patch which should fix the problems we've seen with the register allocator pushing invariants back into loops (particularly fp+offset addresses)
Planning to A/B a test around that to see if we can measure the dynamic instruction impact

13 Sep 2023

Jeff has reached out to Vlad@Red Hat on issues that touch on the register allocator
- Conceptually Vlad agrees there's a problem. Solutions are being discussed
Manolis@VRULL has posted a V5 of his post-allocation address rewriting patch
- Not seeing any correctness regressions being reported
- Just a few minor problems to work through – expect v6 will be ready for integration.

06 Sep 2023

Analysis of LICM still "failing" for certain fp+offset addresses
- The address is runtime invariant, which is good. Computation is hoisted out of loops
- But because its a runtime invariant, its priority for a register is lowered by IRA
- Which can result in the value being spilled and rematerialized at use site (inside loop)
- Need to get Vlad@RedHat involved as he knows this code better than anyone

30 Aug 2023

Register pressure sensitive scheduling enabled by default (primarily to help x264), but also reduces some of the unprofitable address rematerialization seen in IRA
First patch from Raphael under evaluation
- Many small (< .1% improvements). A few notable regression as well. Hoping that as we fix the regressions, the overall trend will improve as well
- Deepsjeng failed, so no data yet for the key target – Raphael is actively debugging correctness issues
Jivan's work to improve LICM on hold until he returns from PTO

23 Aug 2023

Improvements for address generation in functions with large frames landed
Still evaluating workaround for unprofitable address rematerialization
- Workaround will be used irrespective of this evaluation as it significantly helps performance of x264 by avoiding spill code

16 Aug 2023

Potential workaround for unprofitable address rematerializations in register allocator. Not catching all the cases we want yet
- Interestingly enough this work is intersecting with some issues the Rivos team is tackling in constant synthesis
Prototype patch for optimizing away unnecessary arithmetic after frame pointer elimination
- Looks like it should reduce the dynamic instruction counts in deepsjeng by roughly 1%. Smaller improvements elsewhere
Waiting on V4 post-allocation pass to rewrite address computations from Manolis.

09 Aug 2023

Jivan's early RTL address rewriting targetting leela has been merged
- Identified unprofitable address rematerialization in register allocator
- Identified missed optimizations during frame pointer elimination
Deficiency in Manolis's post-allocation register propagation identified & fixed
Still iterating on Manolis's new post-allocation pass to rewrite address computations

02 Aug 2023

Vineet@Rivos has done some analysis. Conclusion is we're no longer propagating away sp→gpr copies which makes the f-m-o patch less effective
Jivan's work helps leela, independent of f-m-o, giving it a path forward
Raphael & Jeff working on infrastructure to allow for rewriting symbolic addresses to allow folding the lo-sum into the memory reference more often
Jivan has analyzed potential benefits from adjusting how we handle large offsets during register allocation. Doesn't seem to be beneficial

26 Jul 2023

Significant overlap between codes improved by Jivan's work and Manolis's work, so may not see the expected synergy
- Still expect both patches to go forward though, for various good reasons
Minimal progress on symbolic address rewriting. Not getting control to rewrite the address when we want

19 Jul 2023

Vineet from Rivos has isolated and analyzed the x264 issue with Manolis's patch. Jeff L. will likely own fixing.
Testing Jivan's work in combination with Manolis's work not producing expected results. Waiting on Jivan to return from PTO for deeper analysis.
Added note about symbolic address rewriting affecting deepsjeng under investigation by Raphael (Ventana)

13 Jul 2023

V3 of Manolis's patch posted for review today. May fix m68k issue that was blocking as well as x264 compilaton failure
Jivan has posted his early RTL address rewriting patch which looks pretty good
Note Jivan exploring LRA/reload time address rewriting as well

05 Jul 2023

Fixes for stack pointer propagation in generic post register allocation pass integrated. Still waiting on new drop to fix issue exposed on m68k port
Added Jivan's work on address rewriting to facilitate LICM. It's complementary to Manolis's work

28 Jun 2023

Improved code is causing regressions when enabled across all targets. Contractor is working to address those issues
Rivos engineer exploring if this code can be repurposed to fix a problem they're tracking
Report stakeholders/partners in consistent manner

15 Jun 2023– Dates on or before June 15 are approximate

Work posted for upstream review. Adjustments/Updates in progress

01 Jun 2023

Project reported as priority for 2H23

Home

CT_00_004 -- Address rewriting (GCC)