[PATCH 09/12] lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code
Ard Biesheuvel
ardb at kernel.org
Fri Aug 29 09:05:42 PDT 2025
On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers at kernel.org> wrote:
>
> On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote:
> > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers at kernel.org> wrote:
> >
> > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c
> > > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for
> > > ARM and x86_64, should be always enabled too.
> >
> > Maybe a stupid question: what about ARM64? The current NEON
> > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just
> > for ARM.
> >
That code is scalar not NEON, and is carefully tuned to make use of
the ARM barrel shifter, which does not exist on arm64.
> > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64):
> >
> > https://github.com/BLAKE2/BLAKE2/blob/master/neon
>
> There's no ARM64 optimized BLAKE2s code in the Linux kernel yet. If
> it's useful, someone would need to contribute it.
>
NEON is cumbersome in the kernel so this only makes sense if it is
substantially more performant, and I'm skeptical that this is the
case, as you pointed out yourself in
commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13
Author: Eric Biggers <ebiggers at google.com>
Date: Wed Dec 23 00:09:59 2020 -0800
crypto: arm/blake2s - add ARM scalar optimized BLAKE2s
Add an ARM scalar optimized implementation of BLAKE2s.
NEON isn't very useful for BLAKE2s because the BLAKE2s block size
is too small for NEON to help. Each NEON instruction would depend
on the previous one, resulting in poor performance.
Even if NEON code might be slightly faster on some cores, the fact
that it is sensitive to micro-architectural details makes it less
attractive.
More information about the linux-arm-kernel
mailing list