[PATCH 0/5] arm64: Move kernel mode FPSIMD buffer to the stack
Ard Biesheuvel
ardb at kernel.org
Fri Sep 19 15:41:51 PDT 2025
On Fri, 19 Sept 2025 at 21:32, Eric Biggers <ebiggers at kernel.org> wrote:
>
> On Thu, Sep 18, 2025 at 08:35:40AM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb at kernel.org>
> >
> > Move the buffer for preserving/restoring the kernel mode FPSIMD state on a
> > context switch out of struct thread_struct, and onto the stack, so that
> > the memory cost is not imposed needlessly on all tasks in the system.
> >
> > Patches #1 - #3 contains some prepwork so that patch #4 can tighten the
> > rules around permitted usage patterns of kernel_neon_begin() and
> > kernel_neon_end(). This permits #5 to provide a stack buffer to
> > kernel_neon_begin() transparently, in a manner that ensures that it will
> > remain available until after the associated call to kernel_neon_end()
> > returns.
> >
> > Cc: Marc Zyngier <maz at kernel.org>
> > Cc: Will Deacon <will at kernel.org>
> > Cc: Mark Rutland <mark.rutland at arm.com>
> > Cc: Kees Cook <keescook at chromium.org>
> > Cc: Catalin Marinas <catalin.marinas at arm.com>
> > Cc: Mark Brown <broonie at kernel.org>
> >
> > Ard Biesheuvel (5):
> > crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit
> > crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit
> > crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit
> > arm64/fpsimd: Require kernel NEON begin/end calls from the same scope
> > arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack
> >
> > arch/arm64/crypto/aes-ce-ccm-glue.c | 5 +--
> > arch/arm64/crypto/sm4-ce-ccm-glue.c | 10 ++----
> > arch/arm64/crypto/sm4-ce-gcm-glue.c | 10 ++----
> > arch/arm64/include/asm/neon.h | 7 ++--
> > arch/arm64/include/asm/processor.h | 2 +-
> > arch/arm64/kernel/fpsimd.c | 34 +++++++++++++-------
> > 6 files changed, 34 insertions(+), 34 deletions(-)
>
> This looks like the right decision: saving 528 bytes per task is
> significant. 528 bytes is a lot to allocate on the stack too, but
> functions that use the NEON registers are either leaf functions or very
> close to being leaf functions, so it should be okay.
>
Indeed.
> The implementation is a bit unusual, though:
>
> #define kernel_neon_begin() do { __kernel_neon_begin(&(struct user_fpsimd_state){})
> #define kernel_neon_end() __kernel_neon_end(); } while (0)
>
> It works, but normally macros don't start or end code blocks behind the
> scenes like this.
That is kind of the point, as it restricts the use of them to an idiom
that guarantees that the stack variable lives long enough.
> Perhaps it should be more like s390's
> kernel_fpu_begin(), where the caller provides the buffer that the
> registers are stored in?
>
If we're happy to change the API on both arm64 and ARM, then we could
make it more explicit. It's a lot more work, though.
More information about the linux-arm-kernel
mailing list