...
By expanding these sequences inline, we can avoid the overhead of a function call and expose more of the underlying semantics of the call to the optimizers, thus potentially allowing further optimization. It also creates larger scheduling blocks and fewer optimization barriers. VRULL has provided scalar implementations of the key routines. Ventana has contributed vector versions of the key routines.
There are additional cases that can be handled in the vector space. In particular when the source/destinations may overlap in a memory copy, if the entire amount copied fits in a vector register, then the runtime testing for forward vs backward copies can be avoided. Support for these cases has been posted by Sergei at Rivos, but missed the gcc-14 development deadline. It's unclear if we will make an exception for this work or just defer it to gcc-15.
Stakeholders/Partners
RISE:
Ventana: Jeff Law& Robin
Rivos: Palmer & Sergei
External:
VRULL: Christoph Mullner (under contract to Ventana)
Embecosm: Joern Rennecke
Dependencies
Status
Page Properties | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Updates
- Robin Dapp from Ventana has submitted & integrated vector versions of str[n]cmp, strlen
- Christoph has submitted and integrated scalar versions of str[n]cmp, strlen, memcpy
- Sergei Lewis has submitted vector version of memset, memmove and memcmp.
- Joern has submitted Embecosm's work to inline vectorized memcpy.
...