[PATCH v7 1/2] arm64/sve: Split TIF_SVE into separate execute and register state flags

Dave Martin Dave.Martin at arm.com
Wed Feb 10 05:56:55 EST 2021


On Mon, Feb 01, 2021 at 12:29:00PM +0000, Mark Brown wrote:
> Currently we have a single flag TIF_SVE which says that a task is
> allowed to execute SVE instructions without trapping and also that full
> SVE register state is stored for the task.  This results in us doing
> extra work storing and restoring the full SVE register state even in
> those cases where the ABI is that only the first 128 bits of the Z0-V31
> registers which are shared with the FPSIMD V0-V31 are valid.
> 
> In order to allow us to avoid these overheads split TIF_SVE up so that
> we have two separate flags, TIF_SVE_EXEC which allows execution of SVE
> instructions without trapping and TIF_SVE_FULL_REGS which indicates that
> the full SVE register state is stored.  If both are set the behaviour is
> as currently, if TIF_SVE_EXEC is set without TIF_SVE_FULL_REGS then we
> save and restore only the FPSIMD registers until we return to userspace
> with TIF_SVE_EXEC enabled at which point we convert the FPSIMD registers
> to SVE.  It is not meaningful to have TIF_SVE_FULL_REGS set without
> TIF_SVE_EXEC.
> 
> This patch is intended only to split the flags, it does not take
> avantage of the ability to set the flags independently and the new
> state with TIF_SVE_EXEC only should not be observed.
> 
> This is based on earlier work by Julien Gral implementing a slightly
> different approach.
> 
> Signed-off-by: Mark Brown <broonie at kernel.org>
> ---

[...]

> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c

[...]

> @@ -279,18 +327,37 @@ static void sve_free(struct task_struct *task)
>   * This function should be called only when the FPSIMD/SVE state in
>   * thread_struct is known to be up to date, when preparing to enter
>   * userspace.
> + *
> + * When TIF_SVE_EXEC is set but TIF_SVE_FULL_REGS is not set the SVE
> + * state will be restored from the FPSIMD state.
>   */
>  static void task_fpsimd_load(void)
>  {
> +	unsigned int vl;
> +
>  	WARN_ON(!system_supports_fpsimd());
>  	WARN_ON(!have_cpu_fpsimd_context());
>  
> -	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> -		sve_load_state(sve_pffr(&current->thread),
> -			       &current->thread.uw.fpsimd_state.fpsr,
> -			       sve_vq_from_vl(current->thread.sve_vl) - 1);
> -	else
> -		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> +	if (test_thread_flag(TIF_SVE_EXEC)) {
> +		vl = sve_vq_from_vl(current->thread.sve_vl) - 1;

One more nit: because of the confusion that can arises from "vl" being a
somewhat overloaded term in the architecture, I was trying to avoid
using the name "vl" for anything that isn't the vector length in bytes.

Can this instead be renamed to vq_minus_1 to match the function
arguments it's passed for?

(You could save a couple of lines by moving the declaration here and
combining it with this assignment too.)

[...]

Cheers
---Dave



More information about the linux-arm-kernel mailing list