CT_01_012 - Improve shrink-wrapping (LLVM)

About

Analysis of various workloads within the spec2017 integer suite (500.perlbench, 502.gcc, 520.omnetpp, 531.deepsjeng, 557.xz) has shown that "separate shrink wrapping" can improve the dynamic instruction counts for those benchmarks by .5% to perhaps 3%.


Shrink-wrapping analyzes functions in an effort to determine better places to put register saves for the function prologue and register restores for the function epilogue.  This is particularly helpful if a function has a early out or fast path which does not use many, if any callee saved registers.  By optimizing placement of the prologue/epilogue they can often be avoided.  LLVM has a relatively weak form of shrink wrapping which allows finding a better spot for the prologue or epilogue as a whole.  GCC has a stronger algorithm which can shrink wrap individual register saves/restores which provides further improvements of the naive placement algorithm used by LLVM.


Experiments have been done with turning off the "separate shrink wrapping" capability in GCC and observing how the dynamic instruction counts change to get a sense of the potential gains.  Misha is currently exploring various placement algorithms (LCM, sinking/hoisting, dominance based, etc).  

Stakeholders/Partners

RISE:

Ventana: Michael Gudim – lead developer

Ventana: Jeff Law – oversight

External:

Dependencies


Status

Development

COMPLETE


Development TimelineNA
Upstreaming

NOT STARTED


Upstream Version





Contacts

Mikhail Gudim (Ventana)

Jeff Law (Ventana)


Dependencies




Updates

 

  • Everything is working now.  Misha will open an MR to start the external review process

 

  • Still fighting CFI (call-frame-info) notes which are needed for debuggers, exception handling and pthread cancellation
  • Looks like last issue is the need for CFI notes to have variable offsets for vector register spills

 

  • Moved to 2H2024

 

  • No major blockers from initial upstream comments
  • EH issue identified.  Basically we need to mark the "spills" generate for separate shrink wrapping so that CFI notes get properly generated
  • xz performance regression identified and being fixed

 

  • Initial results look promising.  We see significant decreases in load/store traffic in multiple parts of spec
  • However, interacting poorly with EH, so not quite ready for wider evaluation

 

  • Current thinking is to try separate shrink wrapping, but without changing the placement algorithm with the hope that will get us most of the benefit with minimal cost

 

  • Project noted as priority for 20241H