CT_00_010 - Improve Long branch/jump support (GCC)

CT_00_010 - Improve Long branch/jump support (GCC)

About

Once a function exceeds 1M in total size the possibility exists that jump may not be able to reach its intended target.  This can happen for a variety of reasons:

  1. LTO and aggressive inlining can generally improve code performance, but will tend to make larger functions.
  2. Some uarchs desire code alignments for loop boundaries and such.  Honoring those alignment needs will also tend to make larger funtions.
  3. Aggressive hot/code partitioning will move expected cold code after the main part of the function, thus increasing the chance that we'll have a long branch to anther point in the function.



Supporting long branch/jump is requirement in terms of toolchain completeness.  Ventana has upstreamed a patch from Andrew Waterman that addresses this issue from a correctness standpoint.  This is meant to track further improvements under consideration.


In particular, the use of $ra as a scratch in a long jump sequence can clobber the return-address-stack predictors available in higher performance hardware.  While this is not expected to happen often in practice, when it does happen, the performance hit will be significant – once the stack gets out of sync we can get a cascade of mispredicts.  We would like to see this addressed using a register scavenging scheme – essentially looking prior to the branch or at the target/fallthrough of a branch to see if there is a scratch register already available.  If so, that scratch register should be preferred over $ra.  We can still fall back to $ra if no such register can be found.


In addition, it would be advantageous to make $ra available again as a scratch, at least in some circumstances.  For example, in a non-leaf function where the size of the function is known to be small (<1M) or there are no conditional branches we could use $ra as an additional scratch register.  A review of IRA/LRA doesn't show a particularly good way to add $ra to the usable set of registers, but that can probably be handled by adding another target hook.



Stakeholders/Partners

RISE:

Ventana: Jeff Law

SiFive: Andrew Waterman


External:



Dependencies


Status

Development

NO PROGRESS


Development TimelineNA
Upstreaming

NO PROGRESS


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

None



Updates

 

  • Closing down this particular page.  New page for projects in this space added for 2025.

 

  • Note additional improvement that we could make using $ra as a temporary again

 

  • Note interaction with code alignment requests and aggressive hot/cold partitioning.
  • Move to 2H2024., adjust status as this only tracks the improvements to be made.

 

  • Basic work from Andrew W. has been upstreamed
  • Needs a minor followup on long unconditional branch sequence which Jeff will tackle this week.
  • Not considered complete as a register scavenging scheme could improve performance once we have large enough functions to trigger these sequences.

 

  • Project reported as a priority for 1H2024.