[PATCH v3 0/2] arm64: Speed up CRC-32 using PMULL instructions

Thu Oct 17 09:30:19 PDT 2024

On Thu, 17 Oct 2024 at 11:41, Ard Biesheuvel <ardb+git at google.com> wrote:
>
> From: Ard Biesheuvel <ardb at kernel.org>
>
> The CRC-32 code is library code, and is not part of the crypto
> subsystem. This means that callers may not generally be aware of the
> kind of implementation that backs it, and so we've refrained from using
> FP/SIMD code in the past, as it disables preemption, and this may incur
> scheduling latencies that the caller did not anticipate.
>
> This was solved a while ago, and on arm64, kernel mode FP/SIMD no longer
> disables preemption.
>
> This means we can happily use PMULL instructions in the CRC-32 library
> code, which permits an optimization to be implemented that results in a
> speedup of 2 - 2.8x for inputs >1k in size (on Apple M2)
>
> Patch #1 implements some prepwork to handle the scalar CRC-32
> alternatives patching in C code.
>
> Changes since v2:
> - drop alternatives.h #include (#1)
> - drop unneeded branch (#2)
> - fix comment max -> min (#2)
> - add Eric's Rb
>
> Changes since v1:
> - rename crc32-pmull.S to crc32-4way.S and avoid pmull in the function
>   names to avoid confusion about the nature of the implementation;
> - polish the asm a bit, and add some comments
> - don't return via the scalar code if len dropped to 0 after calling the
>   4-way code.
>
> Cc: Eric Biggers <ebiggers at kernel.org>
> Cc: Kees Cook <kees at kernel.org>
>
> Ard Biesheuvel (2):
>   arm64/lib: Handle CRC-32 alternative in C code
>   arm64/crc32: Implement 4-way interleave using PMULL
>

I'll need to respin this - the crc32_be code doesn't actually work correctly.