[PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers
Dave Martin
Dave.Martin at arm.com
Wed Jul 15 12:52:05 EDT 2020
On Mon, Jun 29, 2020 at 02:35:53PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall at arm.com>
>
> Introduce a new helper that will zero all SVE registers but the first
> 128-bits of each vector. This will be used by subsequent patches to
> avoid costly store/maipulate/reload sequences in places like do_sve_acc().
>
> Signed-off-by: Julien Grall <julien.grall at arm.com>
> Reviewed-by: Dave Martin <Dave.Martin at arm.com>
> Signed-off-by: Mark Brown <broonie at kernel.org>
> ---
> arch/arm64/include/asm/fpsimd.h | 1 +
> arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
> arch/arm64/kernel/entry-fpsimd.S | 8 ++++++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 59f10dd13f12..958f642e930d 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
> extern void sve_save_state(void *state, u32 *pfpsr);
> extern void sve_load_state(void const *state, u32 const *pfpsr,
> unsigned long vq_minus_1);
> +extern void sve_flush_live(void);
> extern unsigned int sve_get_vl(void);
>
> struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index feef5b371fba..af43367534c7 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -164,6 +164,13 @@
> | ((\np) << 5)
> .endm
>
> +/* PFALSE P\np.B */
> +.macro _sve_pfalse np
> + _sve_check_preg \np
> + .inst 0x2518e400 \
> + | (\np)
> +.endm
> +
> .macro __for from:req, to:req
> .if (\from) == (\to)
> _for__body %\from
> @@ -198,6 +205,18 @@
> 921:
> .endm
>
> +/* Preserve the first 128-bits of Znz and zero the rest. */
> +.macro _sve_flush_z nz
> + _sve_check_zreg \nz
> + mov v\nz\().16b, v\nz\().16b
> +.endm
> +
> +.macro sve_flush
> + _for n, 0, 31, _sve_flush_z \n
> + _for n, 0, 15, _sve_pfalse \n
> + _sve_wrffr 0
Side note, but as and when hardware is available for benchmarking, it
could be worth investigating how sequences like this perform.
Because WRFFR is self-synchronising, it is a potentially expensive
operation; especially so if there could be in-flight SVE operations.
This isn't directly relevant to this patch, but could be worth a look
later on.
[...]
Cheers
---Dave
More information about the linux-arm-kernel
mailing list