[PATCH v3 5/8] arm64/sve: Implement a helper to flush SVE registers

Wed Jul 15 12:52:05 EDT 2020

On Mon, Jun 29, 2020 at 02:35:53PM +0100, Mark Brown wrote:
> From: Julien Grall <julien.grall at arm.com>
> 
> Introduce a new helper that will zero all SVE registers but the first
> 128-bits of each vector. This will be used by subsequent patches to
> avoid costly store/maipulate/reload sequences in places like do_sve_acc().
> 
> Signed-off-by: Julien Grall <julien.grall at arm.com>
> Reviewed-by: Dave Martin <Dave.Martin at arm.com>
> Signed-off-by: Mark Brown <broonie at kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h       |  1 +
>  arch/arm64/include/asm/fpsimdmacros.h | 19 +++++++++++++++++++
>  arch/arm64/kernel/entry-fpsimd.S      |  8 ++++++++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 59f10dd13f12..958f642e930d 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread)
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> +extern void sve_flush_live(void);
>  extern unsigned int sve_get_vl(void);
>  
>  struct arm64_cpu_capabilities;
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index feef5b371fba..af43367534c7 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -164,6 +164,13 @@
>  		| ((\np) << 5)
>  .endm
>  
> +/* PFALSE P\np.B */
> +.macro _sve_pfalse np
> +	_sve_check_preg \np
> +	.inst	0x2518e400			\
> +		| (\np)
> +.endm
> +
>  .macro __for from:req, to:req
>  	.if (\from) == (\to)
>  		_for__body %\from
> @@ -198,6 +205,18 @@
>  921:
>  .endm
>  
> +/* Preserve the first 128-bits of Znz and zero the rest. */
> +.macro _sve_flush_z nz
> +	_sve_check_zreg \nz
> +	mov	v\nz\().16b, v\nz\().16b
> +.endm
> +
> +.macro sve_flush
> + _for n, 0, 31, _sve_flush_z	\n
> + _for n, 0, 15, _sve_pfalse	\n
> +		_sve_wrffr	0

Side note, but as and when hardware is available for benchmarking, it
could be worth investigating how sequences like this perform.

Because WRFFR is self-synchronising, it is a potentially expensive
operation; especially so if there could be in-flight SVE operations.

This isn't directly relevant to this patch, but could be worth a look
later on.

[...]

Cheers
---Dave