GCC seems to be particularly poor at utilizing the Zbs extension, particularly for 32bit objects on rv64 archs. The core issue is that if a Zbs instruction modifies bit 31, then the compiler likely needs to emit a sign extension to satisfy various architecture/ABI requirements.
However, there are several cases where the compiler can know it is safe to avoid the extension.
xalancbmk's bitset implementation has a redundant bit clear before setting the same bit. This can be fixed in a generic way with an additional logical simplification patternbext can be used to extract a single bit, storing the result into an SImode object, even for rv64 since bits 1..63 will be zero'd by the (&1) operation in the bext specification.~(1 << N) & C can be safely used for a 32bit object on rv64 when C has 33 or more leading zeros(1 << N) | C and (1 << N) ^ C can be safely used when the logical XOR/IOR is done in DImode since we don't have to worry about sign-extending a DImode objectAn explicit extension of SImode (1 << N) to DImode can be handled with a simple bset with x0 as a source operandOccasionally GCC will use a "zero_extract" as a destination for some bifield insertions which can be handled with bset/bclr- When the shift count is masked such that we know bit 31 is not changed we can more aggressively generate Zbs instructions. Two forms
- Bit position is masked via AND.
- Bit position is masked via NAND
Stakeholders/Partners
RISE:
Ventana: Jeff Law – general oversight / guidance & implementation
Ventana: Raphael Zinsly – implementations
External:
Dependencies
Status
Updates
- Generalization of IOR patterns to include XOR submitted.
- Wrapped up new version of patch to exploit masking of bit position.
- Marking as development complete.
- (1 <<N) | C and (1 << N) ^ C for DImode objects has been integrated
- Explicit zero extension of (1 << N) in SImode using bset has been submitted & integrated
- Handling of zero_extract destinations for single bit insertions has been submitted & integrated
- Raphael's code for using bext to extract a single bit, storing the result in an SImode object for rv64 has been integrated
- Raphael's code to handle ~(1 << N) & C where C has at least 33 leading zeros has been integrated
- Jeff's code to handle (1 << N) | C and (1 << N) ^ C for DImode objects has been submitted
- Raphael's code for using bext to extract a single bit, storing the result in an SImode object for rv64 has been submitted
- (X | Y) & ~Y → X & ~Y simplification added to logical simplifications, eliminating xalancbmk's redundancy in its bitset code
- There's probably about a dozen issues identified with patches that are ready or nearly ready for upstreaming. First patch is going through upstream process right now.
- Ventana has discovered (and fixed internally) roughly a dozen cases where GCC was failing to utilize the Zbs extension as well as it could/should
- Performance testing of those changes should start shortly
- Plan is to start upstreaming them as soon as gcc-15 is open for development.
- Noted as a 1H2024 work item.