[PATCH v11 06/40] arm64/sme: Provide ABI documentation for SME

Mark Brown broonie at kernel.org
Fri Feb 11 10:13:58 PST 2022


On Fri, Feb 11, 2022 at 05:02:16PM +0000, Catalin Marinas wrote:
> On Thu, Feb 10, 2022 at 07:45:49PM +0000, Mark Brown wrote:

> > If we don't preserve ZA then userspace will be forced to save it when
> > enabled which increases overall costs, if we do preserve ZA then it's no
> > more expensive for the kernel to save it than userspace, we avoid the
> > cost of restoring in the case where return directly to userspace without
> > context switching and if we do future work to save more lazily then we
> > may be able to avoid some of the saves.

> Thanks for the explanation and the PCS pointer. I guess doing the lazy
> saving scheme in the syscall handler is a lot more painful (faults etc.)
> and it's a user-only ABI/PCS, so we shouldn't tie the kernel into it.

Yes, other than the considerations around clone() it's clearly more
complicated to engage with.

> Given that Linux doesn't plan to use the ZA registers itself, in most
> cases it won't need to restore anything. But we still need to save the
> ZA registers on context switch in case the thread wakes up on a
> different CPU. How often do you reckon would the user do a syscall with
> active ZA?

I would expect it to be very rare that userspace would want to do a
syscall with ZA enabled, though obviously there's not a huge body of
real world SME code to validate that against yet.  The expected usage
pattern is that both ZA and SM are only enabled for fairly brief bursts
of intense computation and disabled when not actively used.  It's
possible that you will see things like logging during computation, or
perhaps streaming data to/from a running algorithm incrementally during
operation, generating syscalls so I wouldn't be surprised to see it
happen but it for most systems it should be a very small percentage of
system calls.

> > > What does that mean? Is this as per the sve.rst doc (unspecified but
> > > zeroed in practice)?

> > Yes, we will exit streaming mode and proceed as per sve.rst and the rest
> > of the ABI.

> So in this case we consider the syscall interface as non-streaming (as
> per the PCS terminology). Should we require that the PSTATE.SM is
> cleared by the user as well? Alternatively, we could make it
> streaming-compatible and just preserve it. Are there any drawbacks?
> kernel_neon_begin() could clear SM if needed.

In fact kernel_neon_begin() already disables PSTATE.SM since we need to
account for the case where userspace was preempted rather than issued a
syscall.  We could require that PSTATE.SM is disabled by the user,
though it's questionable what we could usefully and helpfully do about
it if they forget other than disable it anyway or generate a signal.

We could preserve PSTATE.SM, though since all the other register state
for streaming mode is shared with SVE I would expect that we should be
applying the SVE discard rules to it and there is therefore no other
state that should be retained.  As things stand this would either result
in more overhead or complicate the register save and restore a bit since
if we're in streaming mode we currently assume that we should save and
restore the full SVE register contents but normally in a syscall we only
need to save and restore the FPSIMD subset.  The overhead might go away
anyway as a result of general work on syscall optimisation for SVE,
though that work isn't done yet and may not end up working out that way.

Having said that as with ZA userspace can just exit streaming mode to
avoid any overhead having it enabled introduces and the common case is
expected to be that it will have done so due to the PCS, it should be an
extremely rare case - unlike keeping ZA active there doesn't seem to be
any case where it would be sensible to want to do this and the PCS means
you'd have to actively try to do so.

> > Largely just because it's more complicated to implement copying the ZA
> > backing store for this and it seemed more likely that someone would be
> > surprised by a new process getting stuck carrying a potentially large
> > copy of ZA around that it was unaware of than that someone would
> > actually want that to happen.  It's not a particularly strongly held
> > opinon.

> If PSTATE.ZA is valid and the user does a fork() (well, implemented as
> clone()), normally it expects a nearly identical state in the child.
> With clone() if a new thread is created, we likely don't need the
> additional ZA state. We got away with having to think about this for
> SVE as the state is lost on syscall. Here we risk having a vaguely
> defined ABI - fork() is disabled on arm64 for example but we do have
> clone() and clone3().

> Still thinking about this but maybe we could do something like always
> copy the ZA state unless CLONE_VM is passed for example. It is
> marginally more precise.

We should definitely write this up a bit more explictly whatever we do,
like I say I don't really have strong opinions here.

There's also the interaction with the lazy save state to consider -
TPIDR2 is cleared if CLONE_SETTLS is specified which would interfere
with any lazy state saving that had already happened, though hopefully
userspace is taking care of that as part of setting up the new thread so
I think it's fine.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20220211/07765e80/attachment-0001.sig>


More information about the linux-arm-kernel mailing list