CT_00_030 -- Improve bset/bclr/binv/bext with variable bit offset
GCC seems to be particularly poor at utilizing the Zbs extension, particularly for 32bit objects on rv64 archs. The core issue is that if a Zbs instruction modifies bit 31, then the compiler likely needs to emit a sign extension to satisfy various architecture/ABI requirements.
However, there are several cases where the compiler can know it is safe to avoid the extension.
xalancbmk's bitset implementation has a redundant bit clear before setting the same bit. This can be fixed in a generic way with an additional logical simplification pattern
bext can be used to extract a single bit, storing the result into an SImode object, even for rv64 since bits 1..63 will be zero'd by the (&1) operation in the bext specification.
~(1 << N) & C can be safely used for a 32bit object on rv64 when C has 33 or more leading zeros
(1 << N) | C and (1 << N) ^ C can be safely used when the logical XOR/IOR is done in DImode since we don't have to worry about sign-extending a DImode object
An explicit extension of SImode (1 << N) to DImode can be handled with a simple bset with x0 as a source operand
Occasionally GCC will use a "zero_extract" as a destination for some bifield insertions which can be handled with bset/bclr
When the shift count is masked such that we know bit 31 is not changed we can more aggressively generate Zbs instructions. Two forms
Bit position is masked via AND.
Bit position is masked via NAND
Stakeholders/Partners
RISE:
Ventana: Jeff Law – general oversight / guidance & implementation
Ventana: Raphael Zinsly – implementations
External:
Dependencies
Status
Development | COMPLETE |
|
|---|---|---|
Development Timeline | 1H2024 |
|
Upstreaming | COMPLETE
|
|
Upstream Version | gcc-15 (target) (Spring 2025)
|
|
Contacts | Jeff Law (Ventana) |
|
Dependencies | None |
|
Updates
Jul 6, 2024
Last patch in series committed (exploiting masks of count).
Jun 19, 2024
Generalization of IOR patterns to include XOR submitted.
Wrapped up new version of patch to exploit masking of bit position.
Marking as development complete.
Jun 17, 2024
(1 <<N) | C and (1 << N) ^ C for DImode objects has been integrated
Explicit zero extension of (1 << N) in SImode using bset has been submitted & integrated
Handling of zero_extract destinations for single bit insertions has been submitted & integrated
Jun 10, 2024
Raphael's code for using bext to extract a single bit, storing the result in an SImode object for rv64 has been integrated
Raphael's code to handle ~(1 << N) & C where C has at least 33 leading zeros has been integrated
Jeff's code to handle (1 << N) | C and (1 << N) ^ C for DImode objects has been submitted
Jun 9, 2024
Raphael's code for using bext to extract a single bit, storing the result in an SImode object for rv64 has been submitted
May 25, 2024
(X | Y) & ~Y → X & ~Y simplification added to logical simplifications, eliminating xalancbmk's redundancy in its bitset code
May 9, 2024
There's probably about a dozen issues identified with patches that are ready or nearly ready for upstreaming. First patch is going through upstream process right now.
Apr 25, 2024
Ventana has discovered (and fixed internally) roughly a dozen cases where GCC was failing to utilize the Zbs extension as well as it could/should
Performance testing of those changes should start shortly
Plan is to start upstreaming them as soon as gcc-15 is open for development.
Apr 2, 2024
Noted as a 1H2024 work item.