When the Zbkb extension is enabled an arbitrary 64bit constant can be loaded in a maximum of 5 instructions. A lui+addi for the upper and lower 32 bits and a pack to merge them. High performance uarchs will execute this in 2c and even a simplistic uarch would probably take a maximum of 5c. Contrast to the what we do now where we push complex constants into the constant pool. That's probably 3 instructions and 5c on most uarchs. Naturally we would like to see constant synthesis improved when Zbkb is enabled.
There are also cases that could be improved for designs without Zbkb, but which do have Zbs. Consider a constant with just 4 bits set. Say two non-consecutive bits in the high 32bit part of a 64bit word, then two bits down in the low 12 bits. Such constants will tend to end up in the constant pool. But this could be implemented with two bsets+addi. That is going to be the same size and almost certainly faster than a constant pool reference.
Stakeholders/Partners
RISE:
Ventana: Jeff Law – general oversight / guidance.
External:
Dependencies
Status
Updates
- Noted as a 1H2024 work item.