[RFC PATCH v6 0/2] arm64/sve: Improve performance when handling SVE access traps
Mark Brown
broonie at kernel.org
Mon Jan 11 19:38:42 EST 2021
This patch series aims to improve the performance of handling SVE access
traps, earlier versions were originally written by Julien Gral but based
on discussions on previous versions the patches have been substantially
reworked to use a different approach. The patches are now different
enough that I set myself as the author, hopefully that's OK for Julien.
I've marked this as RFC since it's not quite ready yet but I'd really
like feedback on the overall approach, it's a big change in
implementation. It needs at least one more pass for polish and while
it's holding up in my testing thus far I've not done as much as I'd like
yet.
Per the syscall ABI, SVE registers will be unknown after a syscall. In
practice, the kernel will disable SVE and the registers will be zeroed
(except the first 128 bits of each vector) on the next SVE instruction.
Currently we do this by saving the FPSIMD state to memory, converting to
the matching SVE state and then reloading the registers on return to
userspace. This requires a lot of memory accesses that we shouldn't
need, improve this by reworking the SVE state tracking so we track if we
should trap on executing SVE instructions separately to if we need to
save the full register state. This allows us to avoid tracking the full
SVE state until we need to return to userspace and to convert directly
in registers in the common case where the FPSIMD state is still in
registers then.
As with current mainline we disable SVE on every syscall. This may not
be ideal for applications that mix SVE and syscall usage, strategies
such as SH's fpu_counter may perform better but we need to assess the
performance on a wider range of systems than are currently available
before implementing anything.
It is also possible to optimize the case when the SVE vector length
is 128-bit (ie the same size as the FPSIMD vectors). This could be
explored in the future, it becomes a lot easier to do with this
implementation.
v6:
- Substantially rework the patch so that TIF_SVE is now replaced by
two flags TIF_SVE_EXEC and TIF_SVE_FULL_REGS.
- Return to disabling SVE after every syscall as for current
mainine rather than leaving it enabled unless reset via ptrace.
v5:
- Rebase onto v5.10-rc2.
- Explicitly support the case where TIF_SVE and TIF_SVE_NEEDS_FLUSH are
set simultaneously, though this is not currently expected to happen.
- Extensively revised the documentation for TIF_SVE and
TIF_SVE_NEEDS_FLUSH to hopefully make things more clear together with
the above, I hope this addresses the comments on the prior version
but it really needs fresh eyes to tell if that's actually the case.
- Make comments in ptrace.c more precise.
- Remove some redundant checks for system_has_sve().
v4:
- Rebase onto v5.9-rc2
- Address review comments from Dave Martin, mostly documentation but
also some refactorings to ensure we don't check capabilities multiple
times and the addition of some WARN_ONs to make sure assumptions we
are making about what TIF_ flags can be set when are true.
v3:
- Rebased to current kernels.
- Addressed review comments from v2, mostly around tweaks in the
documentation.
Mark Brown (2):
arm64/sve: Split TIF_SVE into separate execute and register state
flags
arm64/sve: Rework SVE trap access to minimise memory access
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/thread_info.h | 3 +-
arch/arm64/kernel/entry-fpsimd.S | 5 +
arch/arm64/kernel/fpsimd.c | 204 +++++++++++++++++++--------
arch/arm64/kernel/process.c | 7 +-
arch/arm64/kernel/ptrace.c | 8 +-
arch/arm64/kernel/signal.c | 15 +-
arch/arm64/kernel/syscall.c | 3 +-
arch/arm64/kvm/fpsimd.c | 6 +-
9 files changed, 179 insertions(+), 74 deletions(-)
base-commit: 7c53f6b671f4aba70ff15e1b05148b10d58c2837
--
2.20.1
More information about the linux-arm-kernel
mailing list