[PATCH v7 2/2] arm64/sve: Rework SVE trap access to minimise memory access

Dave Martin Dave.Martin at arm.com
Wed Feb 10 06:09:56 EST 2021


On Mon, Feb 01, 2021 at 12:29:01PM +0000, Mark Brown wrote:
> When we take a SVE access trap only the subset of the SVE Z0-Z31
> registers shared with the FPSIMD V0-V31 registers is valid, the rest
> of the bits in the SVE registers must be cleared before returning to
> userspace.  Currently we do this by saving the current FPSIMD register
> state to the task struct and then using that to initalize the copy of
> the SVE registers in the task struct so they can be loaded from there
> into the registers.  This requires a lot more memory access than we
> need.
> 
> The newly added TIF_SVE_FULL_REGS can be used to reduce this overhead -
> instead of doing the conversion immediately we can set only TIF_SVE_EXEC
> and not TIF_SVE_FULL_REGS.  This means that until we return to userspace
> we only need to store the FPSIMD registers and if (as should be the
> common case) the hardware still has the task state and does not need
> that to be reloaded from the task struct we can do the initialization of
> the SVE state entirely in registers.  In the event that we do need to
> reload the registers from the task struct only the FPSIMD subset needs
> to be loaded from memory.
> 
> If the FPSIMD state is loaded then we need to set the vector length.
> This is because the vector length is only set when loading from memory,
> the expectation is that the vector length is set when TIF_SVE_EXEC is
> set.  We also need to rebind the task to the CPU so the newly allocated
> SVE state is used when the task is saved.
> 
> This is based on earlier work by Julien Gral implementing a similar idea.
> 
> Signed-off-by: Mark Brown <broonie at kernel.org>
> ---
>  arch/arm64/include/asm/fpsimd.h  |  2 ++
>  arch/arm64/kernel/entry-fpsimd.S |  5 +++++
>  arch/arm64/kernel/fpsimd.c       | 35 +++++++++++++++++++++-----------
>  3 files changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index bec5f14b622a..e60aa4ebb351 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -74,6 +74,8 @@ extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state,
>  				       unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>  
> +extern void sve_set_vq(unsigned long vq_minus_1);
> +
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>  
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index 2ca395c25448..3ecec60d3295 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -48,6 +48,11 @@ SYM_FUNC_START(sve_get_vl)
>  	ret
>  SYM_FUNC_END(sve_get_vl)
>  
> +SYM_FUNC_START(sve_set_vq)
> +	sve_load_vq x0, x1, x2
> +	ret
> +SYM_FUNC_END(sve_set_vq)
> +
>  /*
>   * Load SVE state from FPSIMD state.
>   *
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 58c749ef04c4..05caf207e2ce 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -994,10 +994,10 @@ void fpsimd_release_task(struct task_struct *dead_task)
>  /*
>   * Trapped SVE access
>   *
> - * Storage is allocated for the full SVE state, the current FPSIMD
> - * register contents are migrated across, and TIF_SVE_EXEC is set so that
> - * the SVE access trap will be disabled the next time this task
> - * reaches ret_to_user.
> + * Storage is allocated for the full SVE state so that the code
> + * running subsequently has somewhere to save the SVE registers to. We
> + * then rely on ret_to_user to actually convert the FPSIMD registers
> + * to SVE state by flushing as required.
>   *
>   * TIF_SVE_EXEC should be clear on entry: otherwise,
>   * fpsimd_restore_current_state() would have disabled the SVE access
> @@ -1016,15 +1016,26 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  
>  	get_cpu_fpsimd_context();
>  
> -	fpsimd_save();
> -
> -	/* Force ret_to_user to reload the registers: */
> -	fpsimd_flush_task_state(current);
> -
> -	fpsimd_to_sve(current);
> +	/*
> +	 * We shouldn't trap if we can execute SVE instructions and
> +	 * there should be no SVE state if that is the case.
> +	 */
>  	if (test_and_set_thread_flag(TIF_SVE_EXEC))
> -		WARN_ON(1); /* SVE access shouldn't have trapped */
> -	set_thread_flag(TIF_SVE_FULL_REGS);
> +		WARN_ON(1);
> +	if (test_and_clear_thread_flag(TIF_SVE_FULL_REGS))
> +		WARN_ON(1);
> +
> +	/*
> +	 * When the FPSIMD state is loaded:
> +	 *      - The return path (see fpsimd_restore_current_state) requires
> +	 *        the vector length to be loaded beforehand.
> +	 *      - We need to rebind the task to the CPU so the newly allocated
> +	 *        SVE state is used when the task is saved.
> +	 */
> +	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1);

Hmmm, I can see why we need this here, but it feels slightly odd.
Still, I don't have a better idea.

Logically, this is all part of a single state change, where we
transition from live FPSIMD-only state in the registers to live SVE
state with a pending flush.  Although we could wrap that up in a helper,
we only do this particular transition here so I guess factoring it out
may not be worth it.

> +		fpsimd_bind_task_to_cpu();
> +	}
>  
>  	put_cpu_fpsimd_context();

>From here, can things go wrong if we get preempted and scheduled out?

I think fpsimd_save() would just set TIF_SVE_FULL_REGS and save out the
full register data, which may contain stale data in the non-FPSIMD bits
because we haven't flushed them yet.

Assuming I've not confused myself here, the same think could probably
happen with Ard's changes if a softirq uses kernel_neon_begin(), causing
fpsimd_save() to get called.

Cheers
---Dave



More information about the linux-arm-kernel mailing list