CT_00_010 - Improve Long branch/jump support (GCC)
About
Once a function exceeds 1M in total size the possibility exists that jump may not be able to reach its intended target. This can happen for a variety of reasons:
- LTO and aggressive inlining can generally improve code performance, but will tend to make larger functions.
- Some uarchs desire code alignments for loop boundaries and such. Honoring those alignment needs will also tend to make larger funtions.
- Aggressive hot/code partitioning will move expected cold code after the main part of the function, thus increasing the chance that we'll have a long branch to anther point in the function.
Supporting long branch/jump is requirement in terms of toolchain completeness. Ventana has upstreamed a patch from Andrew Waterman that addresses this issue from a correctness standpoint. This is meant to track further improvements under consideration.
In particular, the use of $ra as a scratch in a long jump sequence can clobber the return-address-stack predictors available in higher performance hardware. While this is not expected to happen often in practice, when it does happen, the performance hit will be significant – once the stack gets out of sync we can get a cascade of mispredicts. We would like to see this addressed using a register scavenging scheme – essentially looking prior to the branch or at the target/fallthrough of a branch to see if there is a scratch register already available. If so, that scratch register should be preferred over $ra. We can still fall back to $ra if no such register can be found.
In addition, it would be advantageous to make $ra available again as a scratch, at least in some circumstances. For example, in a non-leaf function where the size of the function is known to be small (<1M) or there are no conditional branches we could use $ra as an additional scratch register. A review of IRA/LRA doesn't show a particularly good way to add $ra to the usable set of registers, but that can probably be handled by adding another target hook.
Stakeholders/Partners
RISE:
Ventana: Jeff Law
SiFive: Andrew Waterman
External:
Dependencies
Status
Updates
- Closing down this particular page. New page for projects in this space added for 2025.
- Note additional improvement that we could make using $ra as a temporary again
- Note interaction with code alignment requests and aggressive hot/cold partitioning.
- Move to 2H2024., adjust status as this only tracks the improvements to be made.
- Basic work from Andrew W. has been upstreamed
- Needs a minor followup on long unconditional branch sequence which Jeff will tackle this week.
- Not considered complete as a register scavenging scheme could improve performance once we have large enough functions to trigger these sequences.
- Project reported as a priority for 1H2024.