[PATCH v2 0/4] arm64: Run kernel mode NEON with preemption enabled

Ard Biesheuvel ardb at kernel.org
Fri Nov 24 07:53:29 PST 2023


On Fri, 24 Nov 2023 at 01:11, Eric Biggers <ebiggers at kernel.org> wrote:
>
> Hi Ard,
>
> On Thu, Nov 23, 2023 at 06:44:34PM +0100, Ard Biesheuvel wrote:
> >
> > Currently, kernel mode NEON (SIMD) support is implemented in a way that
> > requires preemption to be disabled while the SIMD registers are live.
> > The reason for this is that those registers are not in the set that is
> > preserved/restored on exception entry/exit and context switch, as this
> > would impact performance generally, even for workloads where kernel mode
> > SIMD is not the bottleneck.
> >
> > However, doing substantial work with preemption disabled is not great,
> > as it affects scheduling latency, which is especially problematic for
> > real-time use cases. So ideally, we should keep preemption enabled when
> > we can, and find another way to ensure that this does not corrupt the
> > NEON register state of in-kernel SIMD users.
> >
> > This series implements a suggestion by Mark Rutland, and introduces a
> > thread_info flag TIF_USING_KMODE_NEON, which indicates to the thread
> > switch machinery that the task in question has live kernel mode SIMD
> > state which needs to be preserved and restored. The space needed for
> > this is allocated in thread_struct. (*)
> >
> > Given that currently, we run kernel mode NEON with softirqs disabled (to
> > avoid the need for preserving kernel mode NEON context belonging to task
> > context while the SIMD unit is being used by code running in softirq
> > context), just removing the preempt_disable/enable calls is not
> > sufficient, and we also need to leave softirqs enabled. This means that
> > we may need to preserve kernel mode NEON state not only on a context
> > switch, but also when code running in softirq context takes ownership of
> > the SIMD unit, but this is straight-forward once we add the scratch
> > space to thread_struct.
> >
> > (*) We might decide to allocate this space (~512 bytes) dynamically, if
> > the thread_struct memory footprint causes issues. However, we should
> > also explore doing the same for the user space FPSIMD state, as kernel
> > threads never return to user space and have no need for this allocation.
> >
> > v2:
> > - tweak some commit logs for clarity
> > - integrate with the existing lazy restore logic
> > - add Mark's R-b to patch #1
> >
> > Cc: Marc Zyngier <maz at kernel.org>
> > Cc: Will Deacon <will at kernel.org>
> > Cc: Mark Rutland <mark.rutland at arm.com>
> > Cc: Kees Cook <keescook at chromium.org>
> > Cc: Catalin Marinas <catalin.marinas at arm.com>
> > Cc: Mark Brown <broonie at kernel.org>
> > Cc: Eric Biggers <ebiggers at google.com>
> >
> > Ard Biesheuvel (4):
> >   arm64: fpsimd: Drop unneeded 'busy' flag
> >   arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
> >   arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
> >   arm64: crypto: Remove conditional yield logic
> >
> >  arch/arm64/crypto/aes-glue.c         |  21 +--
> >  arch/arm64/crypto/aes-modes.S        |   2 -
> >  arch/arm64/crypto/sha1-ce-core.S     |   2 -
> >  arch/arm64/crypto/sha1-ce-glue.c     |  19 +--
> >  arch/arm64/crypto/sha2-ce-core.S     |   2 -
> >  arch/arm64/crypto/sha2-ce-glue.c     |  19 +--
> >  arch/arm64/crypto/sha3-ce-core.S     |   4 +-
> >  arch/arm64/crypto/sha3-ce-glue.c     |  14 +-
> >  arch/arm64/crypto/sha512-ce-core.S   |   2 -
> >  arch/arm64/crypto/sha512-ce-glue.c   |  16 +-
> >  arch/arm64/include/asm/assembler.h   |  29 ----
> >  arch/arm64/include/asm/processor.h   |   3 +
> >  arch/arm64/include/asm/simd.h        |  11 +-
> >  arch/arm64/include/asm/thread_info.h |   1 +
> >  arch/arm64/kernel/asm-offsets.c      |   4 -
> >  arch/arm64/kernel/fpsimd.c           | 162 +++++++++++++-------
> >  16 files changed, 138 insertions(+), 173 deletions(-)
>
> Thanks for doing this!  This seems like a very nice improvement, as long as the
> extra 528 bytes in thread_struct is not too much of an issue.
>

Indeed. And we can always wire up a kmem_cache if it is, and (as I
mentioned in the cover letter) we might actually do the same for the
user mode state in that case, given that kernel threads have no use
for it.

> The arch/arm64/crypto/ changes seem to miss removing some of the places that do
> the "yield every 4096 bytes" thing:
>
...
>
> Those can all be cleaned up too, right?
>

Yes. I'll add a patch for that. I missed those because the reason I
looked into this was the desire to get rid of the cond_yield macro,
but all of this becomes unnecessary as well as a result.


> Please make sure to also update the comments above the assembly functions as
> needed.  For example, in the following, the return type changed to void:
>
>         /*
>          * int __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
>          *                         int blocks)
>          */
>         SYM_FUNC_START(__sha1_ce_transform)
>

OK

> Can you also Cc linux-crypto on future versions of this patch series?
>

Will do.



More information about the linux-arm-kernel mailing list