[PATCH 12/18] arm64: fpsimd: Move fpsimd save/restore inline
Vladimir Murzin
vladimir.murzin at arm.com
Wed May 27 07:49:18 PDT 2026
Hi Mark,
On 5/21/26 14:25, Mark Rutland wrote:
> Currently the FPSIMD register save/restore sequences are written in
> out-of-line assembly routines. While this works, it's somewhat painful:
>
> * As KVM needs to be able to use the sequences in hyp code, separate
> assembly files are used for the regular kernel and KVM code. While the
> common logic is shared in assembly macros, this still requires some
> duplication, and has lead to some trivial divergence.
>
> * For historical reasons, the assembly macros take some register
> arguments as numerical indices (e.g. "fpsimd_save x0, 8" uses x0 and
> x8), which is simply confusing.
>
> * For historical reasons, the SVE save/restore code and FPSIMD
> save/restore code have distinct sequences for FPSR and FPCR. Ideally
> this logic would be shared.
>
> * The assembly sequences can't be instrumented, and so it's harder than
> necessary to catch memory safety issues.
>
> To handle the above, move the FPSIMD register save/restore sequences to
> inline assembly, and share the FPSR+FPCR save/restore with SVE.
>
> Neither GCC nor LLVM instrument memory arguments to inline assembly, so
> explicit instrumentation is added in the same manner as other assembly
> routines. This instrumentation is implicitly disabled by Kbuild for nVHE
> hyp code.
>
> Note that I've used the SVE sequence for restoring FPCR, which uses an
> unconditional write to FPCR. The plain FPSIMD assembly sequence used a
> conditional write to FPCR since 2014 in commit:
>
> 5959e25729a5 ("arm64: fpsimd: avoid restoring fpcr if the contents haven't change")
>
> ... but this was not followed for the SVE assembly implemented in 2017
> in commit:
>
> 1fc5dce78ad1 ("arm64/sve: Low-level SVE architectural state manipulation functions")
>
> ... so I've assumed that this doesn't actually matter in practice, and
> I've erred in favour of the simpler sequence.
>
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Fuad Tabba <tabba at google.com>
> Cc: James Morse <james.morse at arm.com>
> Cc: Marc Zyngier <maz at kernel.org>
> Cc: Mark Brown <broonie at kernel.org>
> Cc: Oliver Upton <oupton at kernel.org>
> Cc: Will Deacon <will at kernel.org>
> ---
> arch/arm64/include/asm/fpsimd.h | 68 ++++++++++++++++++++++++-
> arch/arm64/include/asm/fpsimdmacros.h | 59 ---------------------
> arch/arm64/include/asm/kvm_hyp.h | 2 -
> arch/arm64/kernel/entry-fpsimd.S | 20 --------
> arch/arm64/kvm/hyp/fpsimd.S | 10 ----
> arch/arm64/kvm/hyp/include/hyp/switch.h | 4 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 +-
> 7 files changed, 70 insertions(+), 97 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 6fd5cdf5e5f17..19b373ad0ebf7 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -22,6 +22,8 @@
> #include <linux/stddef.h>
> #include <linux/types.h>
>
> +#define __FPSIMD_PREAMBLE ".arch_extension fp\n" \
> + ".arch_extension simd\n"
> #define __SVE_PREAMBLE ".arch_extension sve\n"
> #define __SME_PREAMBLE ".arch_extension sme\n"
>
> @@ -86,8 +88,70 @@ static inline void fpsimd_load_common(const struct user_fpsimd_state *state)
> write_sysreg_s(state->fpcr, SYS_FPCR);
> }
>
> -extern void fpsimd_save_state(struct user_fpsimd_state *state);
> -extern void fpsimd_load_state(struct user_fpsimd_state *state);
> +static inline void fpsimd_save_vregs(struct user_fpsimd_state *state)
> +{
> + instrument_write(state->vregs, sizeof(state->vregs));
> + asm volatile(
> + __FPSIMD_PREAMBLE
> + " stp q0, q1, [%[vregs], #16 * 0]\n"
> + " stp q2, q3, [%[vregs], #16 * 2]\n"
> + " stp q4, q5, [%[vregs], #16 * 4]\n"
> + " stp q6, q7, [%[vregs], #16 * 6]\n"
> + " stp q8, q9, [%[vregs], #16 * 8]\n"
> + " stp q10, q11, [%[vregs], #16 * 10]\n"
> + " stp q12, q13, [%[vregs], #16 * 12]\n"
> + " stp q14, q15, [%[vregs], #16 * 14]\n"
> + " stp q16, q17, [%[vregs], #16 * 16]\n"
> + " stp q18, q19, [%[vregs], #16 * 18]\n"
> + " stp q20, q21, [%[vregs], #16 * 20]\n"
> + " stp q22, q23, [%[vregs], #16 * 22]\n"
> + " stp q24, q25, [%[vregs], #16 * 24]\n"
> + " stp q26, q27, [%[vregs], #16 * 26]\n"
> + " stp q28, q29, [%[vregs], #16 * 28]\n"
> + " stp q30, q31, [%[vregs], #16 * 30]\n"
> + : "=Q" (state->vregs)
> + : [vregs] "r" (state->vregs)
Missing "memory" clobber here?
> + );
> +}
> +
> +static inline void fpsimd_load_vregs(const struct user_fpsimd_state *state)
> +{
> + instrument_read(state->vregs, sizeof(state->vregs));
> + asm volatile(
> + __FPSIMD_PREAMBLE
> + " ldp q0, q1, [%[vregs], #16 * 0]\n"
> + " ldp q2, q3, [%[vregs], #16 * 2]\n"
> + " ldp q4, q5, [%[vregs], #16 * 4]\n"
> + " ldp q6, q7, [%[vregs], #16 * 6]\n"
> + " ldp q8, q9, [%[vregs], #16 * 8]\n"
> + " ldp q10, q11, [%[vregs], #16 * 10]\n"
> + " ldp q12, q13, [%[vregs], #16 * 12]\n"
> + " ldp q14, q15, [%[vregs], #16 * 14]\n"
> + " ldp q16, q17, [%[vregs], #16 * 16]\n"
> + " ldp q18, q19, [%[vregs], #16 * 18]\n"
> + " ldp q20, q21, [%[vregs], #16 * 20]\n"
> + " ldp q22, q23, [%[vregs], #16 * 22]\n"
> + " ldp q24, q25, [%[vregs], #16 * 24]\n"
> + " ldp q26, q27, [%[vregs], #16 * 26]\n"
> + " ldp q28, q29, [%[vregs], #16 * 28]\n"
> + " ldp q30, q31, [%[vregs], #16 * 30]\n"
> + :
> + : "Q" (state->vregs),
> + [vregs] "r" (state->vregs)
Missing "memory" clobber here?
> + );
> +}
> +
[snip]
Cheers
Vladimir
More information about the linux-arm-kernel
mailing list