[PATCH 3/6] crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code
Eric Biggers
ebiggers at kernel.org
Tue Oct 29 21:15:57 PDT 2024
On Mon, Oct 28, 2024 at 08:02:11PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb at kernel.org>
>
> The only remaining user of the fallback implementation of 64x64
> polynomial multiplication using 8x8 PMULL instructions is the final
> reduction from a 16 byte vector to a 16-bit CRC.
>
> The fallback code is complicated and messy, and this reduction has very
> little impact on the overall performance, so instead, let's calculate
> the final CRC by passing the 16 byte vector to the generic CRC-T10DIF
> implementation.
>
> Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> ---
> arch/arm64/crypto/crct10dif-ce-core.S | 237 +++++---------------
> arch/arm64/crypto/crct10dif-ce-glue.c | 15 +-
> 2 files changed, 64 insertions(+), 188 deletions(-)
For CRCs of short messages, doing a fast reduction from 128 bits can be quite
important. But I agree that when only a 8x8 => 16 carryless multiplication is
available, it can't really be optimized, and just falling back to the generic
implementation is the right approach in that case.
> diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S
> index 8d99ccf61f16..1db5d1d1e2b7 100644
[...]
> ad .req v14
> -
> - k00_16 .req v15
> - k32_48 .req v16
> + bd .req v15
>
> t3 .req v17
> t4 .req v18
> @@ -91,117 +89,7 @@
> t8 .req v22
> t9 .req v23
ad, bd, and t9 are no longer used.
> + // Use Barrett reduction to compute the final CRC value.
> + pmull2 v1.1q, v1.2d, fold_consts.2d // high 32 bits * floor(x^48 / G(x))
v0.2d was accidentally replaced with v1.2d above, which is causing a self-test
failure in crct10dif-arm64-ce.
Otherwise this patch looks good; thanks!
- Eric
More information about the linux-arm-kernel
mailing list