Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

About

Cyclic Redundancy Check (CRC) is a standardized way to implement a degree of data validation and is used in various over the wire protocols, data checksums, etc.  Naive implementations are a bitwise loop over an input message and highly inefficient.  Detection of a CRC loop and transformation into either a table lookup or a combination of clmul + shift operations can speed up such code by 10X.  This would include the CRC loop in the Coremark benchmark – detecting and optimizing that idiom results in about a 10-20% improvement in Coremark scores depending on the micro-architecture.


The detection of a CRC loop is a complex optimization with elements of vectorization, feedback loops, pattern matching, etc.  It is further complicated by function inlining.  At this time we have patches for upstream GCC that can detect and generate code (either table lookup or clmul) for a variety of CRC implementations.  An aggressive schedule would be integration in time for gcc-14, but more likely it will fall into gcc-15.


Stakeholders/Partners

RISE:

Ventana: 1 FTE Contractor, ~1yr.  Mariam Harutyunyan from Center of Advanced Software Technologies), lead developer

Ventana: Jeff Law – oversight


External:

Significant design work and theoretical background provided by Philipp Tomsich and Henry Brausen of VRULL


Dependencies


Status

Development

COMPLETE


Development TimelineNA
Upstreaming

NOT STARTED


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

None



Updates

  • It had been speculated that this work could generate a "crc" instruction on suitable ISAs if a loop with the right field polynomial were encountered.  This has been implemented as a proof of concept on x86
  • It has also been speculated that this work could be used to detect clmul idioms outside of a CRC loop.  This appears to work by relaxing various aspects of the CRC loop verifier.  It is not an area of focus though.
  • Minor bugs have been fixed in the implementation as it is prepared for upstreaming.  Goal is still integration for gcc-15 in the late spring.

 

  • Implementation done for x86_64 using its clmul.  Primary purpose was to show that using clmul, while requiring target specific development, is not a major effort.  This is expected to help with the upstreaming effort.
  • Code drop from 1/26 integrated into the upstream GCC tester.
  • Expectation is to submit code for upstream review in Feb with a goal of integrating for gcc-15 in the late spring.

 

  • Major cleanups/fixes have been made in the gimple→RTL code generation phase.  At this point CRC detection and code generation using table lookups is expected to work on any architecture GCC supports.  Additionally generation of clmul instructions is supported on RISC-V when the appropriate extension is enabled.
  • At this point the development cycle is considered complete and we will be working through the usptreaming process with a target of inclusion in gcc-15 (spring 2025)

 

  • Significant cleanup of backend support.  Should (in theory) be target independent at this point.  We'll do some testing on other targets to verify this shortly.
  • Cleanup of the main recognizer and LFSR validation step in progress
  • Evaluating the ability of Ranger to eliminate the need to introduce another symbolic engine.

 

  • As noted before, not a great initial response to carving out the backend work and trying to upstream that for gcc-14
  • Regardless, we're working through various review comments to make it more palatable for upstream gcc
    • Generalize table based CRC generation so that it works on multiple targets
    • Fix implementation details that would cause failures when running gcc on different host platforms targeting rv64
    • Various coding convention, documentation and formatting issues necessary for upstreaming

  • Not a great initial response to builtin_crc to carve out initial backend work
  • Still evaluating if there's a path forward without having to submit full end-to-end optimization 

 

  • Various updates to handling inlined CRC functions
  • Planning to break out backend/code generator work + builtin as distinct unit (can upstream immediately)
    • Will work with Richard S and TBD from Intel to verify builtin is sufficiently general for their architectures
  • Will also coordinate with Richard S and Richard B from Linaro and SuSE respectively on upstreaming detection code in gimple loop optimizers
  • Mariam has discovered few CRC implementations in the kernel as well.  Working to see if they can be optimized

 

  • Dependency on CRC code being a distinct self-contained function removed, this generalization was the last major implementation roadblock for the initial implementation
  • Using ~25 CRC implementations from Fedora as general testcases.  Converts about 50% of them right now.
  • Expecting to start breaking out independent chunks of work in the coming weeks to start upstreaming (CRC builtin/IFN support with backend expansion for example)

 

  • Project reported as a priority for 1H2024


  • No labels