[PATCH v7 1/2] arm64/sve: Split TIF_SVE into separate execute and register state flags
Dave Martin
Dave.Martin at arm.com
Wed Feb 10 05:56:55 EST 2021
On Mon, Feb 01, 2021 at 12:29:00PM +0000, Mark Brown wrote:
> Currently we have a single flag TIF_SVE which says that a task is
> allowed to execute SVE instructions without trapping and also that full
> SVE register state is stored for the task. This results in us doing
> extra work storing and restoring the full SVE register state even in
> those cases where the ABI is that only the first 128 bits of the Z0-V31
> registers which are shared with the FPSIMD V0-V31 are valid.
>
> In order to allow us to avoid these overheads split TIF_SVE up so that
> we have two separate flags, TIF_SVE_EXEC which allows execution of SVE
> instructions without trapping and TIF_SVE_FULL_REGS which indicates that
> the full SVE register state is stored. If both are set the behaviour is
> as currently, if TIF_SVE_EXEC is set without TIF_SVE_FULL_REGS then we
> save and restore only the FPSIMD registers until we return to userspace
> with TIF_SVE_EXEC enabled at which point we convert the FPSIMD registers
> to SVE. It is not meaningful to have TIF_SVE_FULL_REGS set without
> TIF_SVE_EXEC.
>
> This patch is intended only to split the flags, it does not take
> avantage of the ability to set the flags independently and the new
> state with TIF_SVE_EXEC only should not be observed.
>
> This is based on earlier work by Julien Gral implementing a slightly
> different approach.
>
> Signed-off-by: Mark Brown <broonie at kernel.org>
> ---
[...]
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
[...]
> @@ -279,18 +327,37 @@ static void sve_free(struct task_struct *task)
> * This function should be called only when the FPSIMD/SVE state in
> * thread_struct is known to be up to date, when preparing to enter
> * userspace.
> + *
> + * When TIF_SVE_EXEC is set but TIF_SVE_FULL_REGS is not set the SVE
> + * state will be restored from the FPSIMD state.
> */
> static void task_fpsimd_load(void)
> {
> + unsigned int vl;
> +
> WARN_ON(!system_supports_fpsimd());
> WARN_ON(!have_cpu_fpsimd_context());
>
> - if (system_supports_sve() && test_thread_flag(TIF_SVE))
> - sve_load_state(sve_pffr(¤t->thread),
> - ¤t->thread.uw.fpsimd_state.fpsr,
> - sve_vq_from_vl(current->thread.sve_vl) - 1);
> - else
> - fpsimd_load_state(¤t->thread.uw.fpsimd_state);
> + if (test_thread_flag(TIF_SVE_EXEC)) {
> + vl = sve_vq_from_vl(current->thread.sve_vl) - 1;
One more nit: because of the confusion that can arises from "vl" being a
somewhat overloaded term in the architecture, I was trying to avoid
using the name "vl" for anything that isn't the vector length in bytes.
Can this instead be renamed to vq_minus_1 to match the function
arguments it's passed for?
(You could save a couple of lines by moving the declaration here and
combining it with this assignment too.)
[...]
Cheers
---Dave
More information about the linux-arm-kernel
mailing list