...
Investigation of the 538.imagick benchmark from spec2017 shows that uses ceil/floor routines heavily. While the Zfa extension can be used to optimize these calls into simple FP conversion instructions, it is believed that an alternate implementation based on just the F/D conversions can be implemented which will significantly improve performance on designs that do not implement Zfa. It appears to be roughly a 9-10% improvement in the dynamic instruction count.
LLVM already has this optimization in place.
...