CT_01_012 - Improve shrink-wrapping (LLVM)
About
Analysis of various workloads within the spec2017 integer suite (500.perlbench, 502.gcc, 520.omnetpp, 531.deepsjeng, 557.xz) has shown that "separate shrink wrapping" can improve the dynamic instruction counts for those benchmarks by .5% to perhaps 3%.
Shrink-wrapping analyzes functions in an effort to determine better places to put register saves for the function prologue and register restores for the function epilogue. This is particularly helpful if a function has a early out or fast path which does not use many, if any callee saved registers. By optimizing placement of the prologue/epilogue they can often be avoided. LLVM has a relatively weak form of shrink wrapping which allows finding a better spot for the prologue or epilogue as a whole. GCC has a stronger algorithm which can shrink wrap individual register saves/restores which provides further improvements of the naive placement algorithm used by LLVM.
Experiments have been done with turning off the "separate shrink wrapping" capability in GCC and observing how the dynamic instruction counts change to get a sense of the potential gains. Misha is currently exploring various placement algorithms (LCM, sinking/hoisting, dominance based, etc).
Stakeholders/Partners
RISE:
Ventana: Michael Gudim – lead developer
Ventana: Jeff Law – oversight
External:
Dependencies
Status
Updates
- Everything is working now. Misha will open an MR to start the external review process
- Still fighting CFI (call-frame-info) notes which are needed for debuggers, exception handling and pthread cancellation
- Looks like last issue is the need for CFI notes to have variable offsets for vector register spills
- Moved to 2H2024
- No major blockers from initial upstream comments
- EH issue identified. Basically we need to mark the "spills" generate for separate shrink wrapping so that CFI notes get properly generated
- xz performance regression identified and being fixed
- Initial results look promising. We see significant decreases in load/store traffic in multiple parts of spec
- However, interacting poorly with EH, so not quite ready for wider evaluation
- Current thinking is to try separate shrink wrapping, but without changing the placement algorithm with the hope that will get us most of the benefit with minimal cost
- Project noted as priority for 20241H