...

Investigation of the 538.imagick benchmark from spec2017 shows that uses ceil/floor routines heavily. While the Zfa extension can be used to optimize these calls into simple FP conversion instructions, it is believed that an alternate implementation based on just the F/D conversions can be implemented which will significantly improve performance on designs that do not implement Zfa. It appears to be roughly a 9-10% improvement in the dynamic instruction count, but a 17% cycle improvement for 538.imagick.

LLVM already has this optimization in place.

...

Page Properties

Development

Status


colour	Green
title	COMPLETE

Development Timeline

NA1H2024

Upstreaming

Status


colour	RedGreen
title	NOT STARTEDCOMPLETE

Upstream Version

gcc-15 (target)

Spring 2025

Contacts

Jeff Law (Ventana)

Dependencies

None

Updates

09 May 2024

Jivan's patch has been upstreamed.

20 Mar 2024

Patch posted upstream. Seems to have general consensus to go forward pending final review when gcc-15 is open for devleopment

17 Mar 2024

Added note about actual performance improvement seen (17%).
Jivan has been asked to post his patch to gcc-patches list for review

13 Mar 2024

An implementation borrowing heavily from LLVM is under evaluation. This implementation implements more efficient versions of ceil, round, nearbyint, rint, etc
In addition to using existing conversions to implement those functions, the implementation also includes sign extension removal for cases where the ultimate result is a GPR
Expecting this to save 300-400 billion instructions for imagick benchmark. Enough that we expect to see a measurable (perhaps double digit) improvement in the benchmark's overall performanc

...

Versions Compared

Old Version 1

New Version Current

Key

Updates

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Updates