CT_00_004 -- Address rewriting (GCC)

About

Certain address computations are poorly optimized by GCC.  While the scenarios where this happens are relatively limited, they do show up within cpu2017 's mcf, deepsjeng and cactubssn) benchmarks.  By re-associating the address arithmetic and combining constants we can eliminate 1-2% of the dynamic instruction counts in these benchmarks, improving both performance and code density.  This re-association work is to be done after register allocation in a target independent pass and has been integrated into upstream gcc.


Additionally, certain addresses can be re-associated to facilitate loop invariant code motion, particularly address computations for objects stored on the stack.  This rewriting early in the RTL optimizer pipeline can in turn also expose additional opportunities for the post register allocation pass mentioned above.  This early RTL address rewriting is expected to improve leela's instruction count  in cpu2017 by 0.4%.  These items are complete and integrated.


Jivan has also identified problems with fp elimination which result in unnecessary address arithmetic in some cases as well as unprofitable rematerialization of constant addresses in register allocator.    These issues have been addressed in the register allocator.  No benchmarking has been done to evaluate impact – it's expected to be relatively small.


Symbolic address reassociation for deepsjeng was ultimately concluded to be non-viable and has been dropped.


Finally, hooks also exist to allow target dependent address rewriting during LRA/reload.  This typically is used to adjust memory references with out of range constants, careful rewriting of such addresses can result in sharing the "highpart" of the address calculation.  This is not expected to be a major improvement, but is closely related to work Jivan has already been going and is being included for completeness.


Stakeholders/Partners

RISE:

Ventana: 1 FTE for a few months (Manolis Tsamis with VRULL) on the post register allocation address rewriting

Ventana: 1 FTE for a couple weeks (Jivan Hakobyan with CAST at RAU) for early RTL address rewriting to improve LICM

Ventana: Jeff Law, review/oversight, debugging/testing on other architectures

Ventana: Raphael Zinsly: Exploring rewriting symbolic addresses to improve deepsjeng

External:

VRULL: Manolis Tsamis, Philipp Tomsich, primary authors, ongoing development/improvements


Dependencies

None

Status

Development

COMPLETED


Development Timeline2H2023
Upstreaming

COMPLETED


Upstream Version

gcc-14 (Spring 2024)

gcc-13 RISC-V Coordination Branch




Contacts

Jeff Law (Ventana)

Manolis Tsamis (VRULL)


Dependencies

None



Updates

 

  • Final (question) update on items completed and dropping symbolic address reassociation.

  • Manolis's patch has been integrated upstream.
  • Vlad's fixes to the register allocator have been installed upstream. 

 

  • Vlad's change had to be reverted.  Unsure if it'll be refined and resubmitted or is going to be dropped
  • V6 of Manolis's (VRULL) rewriting patch posted.  One minor x86 issue to resolve, but it's otherwise ready to go.

 

  • Vlad's work seems to be making things worse.  Testcase extracted and passed along to Vlad
  • Raphael's improvements for symbolic address rewriting still failing deepsjeng, no luck yet with finding a shorter path to debugging the failure

 

  • Vlad has pushed a patch which should fix the problems we've seen with the register allocator pushing invariants back into loops (particularly fp+offset addresses)
  • Planning to A/B a test around that to see if we can measure the dynamic instruction impact

 

  •  Jeff has reached out to Vlad@Red Hat on issues that touch on the register allocator
    • Conceptually Vlad agrees there's a problem.  Solutions are being discussed
  • Manolis@VRULL has posted a V5 of his post-allocation address rewriting patch
    • Not seeing any correctness regressions being reported
    • Just a few minor problems to work through – expect v6 will be ready for integration.

 

  • Analysis of LICM still "failing" for certain fp+offset addresses
    • The address is runtime invariant, which is good.  Computation is hoisted out of loops
    • But because its a runtime invariant, its priority for a register is lowered by IRA
    • Which can result in the value being spilled and rematerialized at use site (inside loop)
    • Need to get Vlad@RedHat involved as he knows this code better than anyone

 

  • Register pressure sensitive scheduling enabled by default (primarily to help x264), but also reduces some of the unprofitable address rematerialization seen in IRA
  • First patch from Raphael under evaluation
    • Many small (< .1% improvements).  A few notable regression as well.  Hoping that as we fix the regressions, the overall trend will improve as well
    • Deepsjeng failed, so no data yet for the key target – Raphael is actively debugging correctness issues
  • Jivan's work to improve LICM on hold until he returns from PTO

 

  • Improvements for address generation in functions with large frames landed
  • Still evaluating workaround for unprofitable address rematerialization
    • Workaround will be used irrespective of this evaluation as it significantly helps performance of x264 by avoiding spill code

 

  • Potential workaround for unprofitable address rematerializations in register allocator.  Not catching all the cases we want yet
    • Interestingly enough this work is intersecting with some issues the Rivos team is tackling in constant synthesis
  • Prototype patch for optimizing away unnecessary arithmetic after frame pointer elimination
    • Looks like it should reduce the dynamic instruction counts in deepsjeng by roughly 1%.  Smaller improvements elsewhere
  • Waiting on V4 post-allocation pass to rewrite address computations from Manolis.

 

  • Jivan's early RTL address rewriting targetting leela has been merged
    • Identified unprofitable address rematerialization in register allocator
    • Identified missed optimizations during frame pointer elimination
  • Deficiency in Manolis's post-allocation register propagation identified & fixed
  • Still iterating on Manolis's new post-allocation pass to rewrite address computations

 

  • Vineet@Rivos has done some analysis.  Conclusion is we're no longer propagating away sp→gpr copies which makes the f-m-o patch less effective
  • Jivan's work helps leela, independent of f-m-o, giving it a path forward
  • Raphael & Jeff working on infrastructure to allow for rewriting symbolic addresses to allow folding the lo-sum into the memory reference more often
  • Jivan has analyzed potential benefits from adjusting how we handle large offsets during register allocation.  Doesn't seem to be beneficial

  • Significant overlap between codes improved by Jivan's work and Manolis's work, so may not see the expected synergy
    • Still expect both patches to go forward though, for various good reasons
  • Minimal progress on symbolic address rewriting.  Not getting control to rewrite the address when we want 

 

  • Vineet from Rivos has isolated and analyzed the x264 issue with Manolis's patch.  Jeff L. will likely own fixing.
  • Testing Jivan's work in combination with Manolis's work not producing expected results.  Waiting on Jivan to return from PTO for deeper analysis.
  • Added note about symbolic address rewriting affecting deepsjeng under investigation by Raphael (Ventana)

 

  • V3 of Manolis's patch posted for review today.  May fix m68k issue that was blocking as well as x264 compilaton failure
  • Jivan has posted his early RTL address rewriting patch which looks pretty good
  • Note Jivan exploring LRA/reload time address rewriting as well

 

  • Fixes for stack pointer propagation in generic post register allocation pass integrated.  Still waiting on new drop to fix issue exposed on m68k port
  • Added Jivan's work on address rewriting to facilitate LICM.  It's complementary to Manolis's work

 

  • Improved code is causing regressions when enabled across all targets.  Contractor is working to address those issues
  • Rivos engineer exploring if this code can be repurposed to fix a problem they're tracking
  • Report stakeholders/partners in consistent manner

– Dates on or before June 15 are approximate

  • Work posted for upstream review.  Adjustments/Updates in progress

 

  • Project reported as priority for 2H23