CT_00_037 -- Zicond with if-conversion improvements (GCC)
About
The ZiCondops extension provides a conditional zero primitive upon which subsets of conditional move and conditional arithmetic/logical operations can be implemented. Transforming control flow into conditional operations can improve code performance by eliminating branch mispredict costs as well as reducing the load on the branch predictors. The earlier in the optimizer pipeline these transformations are performed the more likely they are to expose secondary optimization opportunities as well since the transformations result in larger basic blocks (a fundamental unit of code most compiler optimizations work on).
This item is meant to track additional opportunities to optimize code using the zicond extension
- Improvement of if-conversion pass in GCC to handle SUBREG and zero/sign extended objects.
- If-convert the conditional in the move_one_fast loop of deepsjeng
- Two approaches
- Improve min/max discovery in gimple which should simplify the conditional code to optimizable form in the RTL if-converter code
- Improve the RTL if-converter code to better handle multiple if-convertable instrutions
- Add backend pattern to recognize an if-then-else as a min/max
- Robin has submitted some code for this, but it needs to be adjusted for reviewer feedback
- Two approaches
- Cost model adjustments
- As touched on in upstream bug 112462, when we have a condition other than (reg) eq/ne (const_int 0) we need to bump the cost of using zicond as the condition will need canonicalization.
- Similarly we may need to bump the cost depending on the true/false arms
- May want to do some refactoring so that we can share code across costing & expansion.
Revisit LOGICAL_OP_NON_SHORT_CIRCUITSeems like this code was carried over from the MIPS port without really understanding the implications from a code generation standpointReverting to default behavior shows > 2% improvement on the BPI, with minimal code size increase
Code generation adjustmentsAndrew Pinski has made adjustments to optimize cases where a series of boolean values feed a "phi" (collects values from distinct control flow pathsEssentially results in fewer conditional branches, but didn't play well with Zicond, which has been fixed.
- When one arm of a conditional move can be trivially derived from the other, say by adding a small constant, we can emit a single zicond + adjustment rather than a fully generalized conditional move via 2 zicond instructions. Conceptually this is similar to how we handle something like x = cond ? C1 : C2, we just need to detect it earlier. See these examples on godbolt.
Matching this style would be one approach and probably generally profitable for the first case: (set (reg:DI 135 [ <retval> ]) (plus:DI (if_then_else:DI (reg:DI 145) (const_int 0 [0]) (reg:DI 143)) (reg:DI 147))) Obviously we could replace the PLUS with a variety of operators. Another approach would likely be to match (which falls into the sub-word cases) (set (reg:DI 147) (if_then_else:DI (reg:DI 145) (sign_extend:DI (plus:SI (subreg:SI (reg:DI 138) 0) (const_int 5 [0x5]))) (const_int 0 [0])))
Analysis has shown that the most common missed if-conversion cases for RISC-V are related to mode changing operators such as SUBREG, ZERO_EXTEND and SIGN_EXTEND which are commonly used when operating on 32bit objects for rv64.. ESWIN and Ventana have differing implementations in this space that need to be resolved. The core concern with the ESWIN implementation is that it directly modifies the objects in the IL, which in turn means that it's difficult (potentially impossible) to correctly handle certain cases (shifts). In contrast the Ventana implementation emits new IL for the converted sequence and deletes the old parts of the IL.
Stakeholders/Partners
RISE:
Ventana: Raphael Zinsly, Jeff Law, Robin Dapp ESWIN: Fei Gao
External:
Dependencies
Status
Updates
- The custom implementation of LOGICAL_OP_NON_SHORT_CIRCUIT has been removed
- Shows a 2%+ improvement on the BPI
- Currently evaluating on another design
- A code generation change was made for cases where boolean values show up in PHI nodes
- Generally results in fewer branches and instead relies on ops like sCC to manipulate the values
- Regressed on designs with conditional move instructions such as zicond, xventanacondops, theadcmov, SFB, etc
- Adjustments made to RISC-V backend to restore performance
- Robin submitted code to pick up the min/max case in deepsjeng
- Upstream wants to see some adjustments
- Did show a nice performance uplift on design (I forgot the final number, but it was notable)
- Add additional examples for cases where zicond code could be improved.
- Remaining items from 1H2024 rolled into new task