[PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return

Fri Nov 13 13:48:56 EST 2020

Hi Mark,

This code is pretty complex, so it will take some time to understand.
I'd like Dave to review them  as well since he was involved in the early
discussions with Julien.

In the meantime, I updated my TLA+ model (mechanically, I can't claim I
fully understand the patches) with these changes and hasn't found a
failure yet (though it takes days to complete). Current model here if
anyone is interested:

https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/fpsimd.tla

Some questions below.

On Fri, Nov 06, 2020 at 07:35:52PM +0000, Mark Brown wrote:
> From: Julien Grall <julien.grall at arm.com>
> 
> Per the syscalls ABI the state of the SVE registers is unknown after a
> syscall. In practice the kernel will disable SVE and zero all the
> registers but the first 128-bits of the vector on the next SVE
> instruction. In workloads mixing SVE and syscalls this will result in at
> least one extra entry/exit to the kernel per syscall when the SVE
> registers are accessed for the first time after the syscall.
> 
> To avoid the second entry/exit a new flag TIF_SVE_NEEDS_FLUSH is
> introduced to mark a task that needs to flush the SVE context on
> return to userspace.

I was thinking that we can we avoid this new TIF flag altogether and
just zero the registers via do_el0_svc() (or sve_user_discard()) if
TIF_SVE was set on entry. We'd have to skip clearing TIF_SVE on syscall
entry though. This has the potential issue of over-saving during context
switch but we may be able to use an in_syscall() check.

> On entry to a syscall the flag TIF_SVE will still be cleared, it will
> be restored on return to userspace once the SVE state has been flushed.
> This means that if a task requires to synchronize the FP state during a
> syscall (e.g context switch, signal) only the FPSIMD registers will be
> saved. When the task is rescheduled the SVE state will be loaded from
> FPSIMD state.
> 
> We could instead handle flushing the SVE state in do_el0_svc() however
> doing this reduces the potential for further optimisations such as
> initializing the SVE registers directly from the FPSIMD state when
> taking a SVE access trap and has some potential edge cases if we
> schedule before we return to userspace after do_el0_svc().

Ah, you covered the do_el0_svc() topic here already. Is this potential
optimisation prevented because TIF_SVE has two meanings: SVE access
enabled and sve_state valid (trying to page this code in)? For example,
task_fpsimd_load() uses TIF_SVE to check whether to load from sve_state
or from fpsimd. fpsimd_bind_task_to_cpu() uses TIF_SVE to enable the
user access.

Maybe that's what TIF_SVE_NEEDS_FLUSH tries to address but I'd rather
have two clearly defined flags and better separate their meaning:

- TIF_SVE_ACC - it tells us whether the user access was enabled or it
  needs to be enabled on return to user (maybe we can skip this and just
  enable it directly in places like do_sve_acc).
- TIF_SVE_STATE_VALID - the validity of the kernel sve_state data.

I'll list what I think the requirement are for the SVE state and which
(not necessarily how the current code behaves):

1. We can return to user code with SVE disabled (say on first use).

2. On SVE access fault, we populate the registers either from a
   previously saved SVE state or just zero them. I think we currently
   copy the fpsimd state to sve_state just to restore the latter but
   this could look better. Here the sve_state is considered discarded
   since we enabled it in user, so it's stale.

3. On syscall entry we have to discard the hardware SVE state if access
   was enabled in user.

4. On context switch, we have to save the hardware state if it was not
   discarded (in_syscall() and maybe check TIF_SVE_ACC). If we saved it,
   we'd need to set TIF_SVE_STATE_VALID.

5. On return to user from syscall, we just restore the FPSIMD state (I
   think we still need to do something with the predicates but sve_state
   is invalid, so we just initialise them). As an optimisation, we want
   to leave SVE access on.

5. On return to user from other exception, we restore either from FPSIMD
   or saved sve_state (if valid).

The patch may try to address part of the above but I find that the
addition of TIF_SVE_NEEDS_FLUSH makes this state machine more
complicated than necessary (well, based on my mental model; possibly I'm
missing some other cases).

-- 
Catalin