[PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return

Dave Martin Dave.Martin at arm.com
Tue Nov 17 13:16:20 EST 2020


On Mon, Nov 16, 2020 at 07:45:51PM +0000, Mark Brown wrote:
> On Mon, Nov 16, 2020 at 05:59:39PM +0000, Dave Martin wrote:
> 
> > So far as I can see, TIF_SVE_NEEDS_FLUSH is a special state in which SVE
> > is half-enabled and the SVE regs half-loaded, so it's not an orthogonal
> > flag, but a distinct state that sits between the others.
> 
> > In this state the regs are neither fully saved nor fully loaded, and we
> > haven't committed to either enabling or disabling SVE for userspace.
> > It's an intermediate state which we can choose our path out of based on
> > policy or performance concerns.
> 
> When I say orthogonal I mean to TIF_SVE only - to me that's essentially
> unrelated while we're in the kernel.  It's true that we'll have to
> decide if we want to transition on return to userspace but for the time
> we're in the kernel it doesn't need to interact and we don't need to pay
> attention to TIF_SVE when we're returning.
> 
> TIF_FOREIGN_FPSTATE is more tied up, it's definitely can't be thought of
> as orthogonal.  I think of that more like what Catalin was suggesting
> where it's a flag about where the data is stored rather than about the
> state we want to go into, which I found makes it fairly easy to reason
> about.

I think that thinking of TIF_SVE_NEEDS_FLUSH as a distinct toggle may
not help helping here -- it feels more like an override or a special
case.

I also wonder whether increasing the amount of commenting is actually a
trap here: there's a risk of piling new confusion on old.  The actual
problems might be that there is too much commenting already, not too
little...


Taking a step back, this is how I think it was all supposed to work,
prior to this series.  This time, I'll try to describe the flags
separately rather than enumerating all the combinations (which I think
may have been a misstep):

TL;DR: look at the pictures first ;)


 --8<--

Old flag meanings:

 * TIF_FOREIGN_FPSTATE (along with fpsimd_last_state array and thread->
fpsimd.cpu) just controls whether the task's vector state is stored in
memory (true) or in the registers (false).

 * TIF_SVE just controls whether we track full SVE state for the task
(true), or the FPSIMD subset only (false).  The other properties of
TIF_SVE flow from this.

At this level of abstraction, these two concepts are entirely
independent:

          !SVE            SVE
        +---------------+---------------+
 !FFP   | Track: FPSIMD | Track: SVE    |
        | Where: regs   | Where: regs   |
        +---------------+---------------+
 FFP    | Track: FPSIMD | Track: SVE    |
        | Where: memory | Where: memory |
        +---------------+---------------+

(implementation subtleties aside, of course.)


I think where we get in a tangle is that TIF_SVE conflates mechanism
(whether the kernel tracks the full SVE state) with policy (whether
userspace cares about the full SVE state).  At present, we can't
describe the situation where we track SVE state but userspace doesn't
care, so we either have to do nothing at all and just leave SVE on all
the time, or we have to disable SVE at the first
opportunity.

To keep things "simple" there's another conflation today: switching
between SVE and FPSIMD always involves dumping to memory, mungeing and
reloading.  This means that there is a single code path to handle this,
which good for avoiding fragmentation, but it's inflexible and has a
performance cost -- and it means that flipping TIF_SVE often involves
messing with TIF_FOREIGN_FPSTATE too.


So, what we need here is a way (i.e., a new state) to indicate that
userspace doesn't care about the extra SVE state.  Obviously this not
orthogonal to TIF_SVE: if SVE state is not tracked, then there is
no state _to_ care about.

As I understand it, this is what the TIF_SVE_NEEDS_FLUSH flag
describes.  It separates policy (userspace's policy in this case --
i.e., whether it needs the full SVE state) from mechanism (whether the
kernel keeps track of the full SVE state).  Perhaps we can come up with
another name, such as TIF_SVE_MAY_DISCARD.

Whatever we call this new intermediate state, we have a something like
this.

          !SVE            ???             SVE
        +---------------+---------------+---------------+
        | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
 !FFP   | Track: FPSIMD | Track: SVE    | Track: SVE    |
        | Where: regs   | Where: regs   | Where: regs   |
        +---------------+---------------+---------------+
        | Need: FPSIMD  | Need: FPSIMD  | Need: SVE     |
 FFP    | Track: FPSIMD | Track: SVE    | Track: SVE    |
        | Where: memory | Where: memory | Where: memory |
        +---------------+---------------+---------------+

(FFP = TIF_FOREIGN_FPSTATE, SVE = TIF_SVE)

However, the middle column is a bit special: we can't run in userspace
in this state, since once userspace touches the regs we no longer know
whether it cares about the data in them.  Also, it's a bit pointless
representing this state in memory, since the main point of having it is
to _avoid_ dumping to memory.  And we do need to do a bit of work to
sanitise the state -- unless we always do it on the syscall entry path.


So, this may be a better picture:
                                                          Effort needed
                                                          to load regs
          !SVE            ???             SVE
        +---------------+               +---------------+
        | Need: FPSIMD  |               | Need: SVE     | none
 !FFP   | Track: FPSIMD +---------------+ Track: SVE    |   ^
        | Where: regs   | N: FPSIMD     | Where: regs   |   :
        +---------------+ T: undecided  +---------------+ some
        | Need: FPSIMD  | W:regs+cleanup| Need: SVE     |   :
 FFP    | Track: FPSIMD +---------------+ Track: SVE    |   v
        | Where: memory |               | Where: memory | full reload
        +---------------+               +---------------+

On syscall, the possible flows would now be:

        +---------------+               +---------------+
        |               |               |               |
        |      [@]      +---------------+               |
        |               |            <======            |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

where [@] indicates staying in the same state.  See the "Making things
symmetric" (below) for discussion of this.


Other kernel entries (interrupts and exceptions) don't change the
state:

        +---------------+               +---------------+
        |               |               |               |
        |      [@]      +---------------+      [@]      |
        |               |               |               |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

On kernel exit:

        +---------------+               +---------------+
        |               |               |               |
        |  ^            +---------------+            ^  |
        |  :         <=======  [*]  =======>         :  |
        +--:------------+               +------------:--+
        |  :            |               |            :  |
        |  :            +---------------+            :  |
        |               |               |               |
        +---------------+               +---------------+

where [*] indicates a decision point.


And finally, on sched-out (or any other fpsimd_save() event):

        +---------------+               +---------------+
        |               |               |               |
        |  :            +---------------+            :  |
        |  :            |               |            :  |
        +--:------------+               +------------:--+
        |  :         <=======  [*]  =======>         :  |
        |  V            +---------------+            V  |
        |               |               |               |
        +---------------+               +---------------+

...at least in threory.


While TIF_FOREIGN_FPSTATE and TIF_SVE remain orthogonal to each other,
this new state is not orthogonal to either.  I don't think that's a bug,
providing that we understand its role.  Since it's a 5th state, we are
going to burn a flag on it, but this doesn't mean that it needs to (or
even should) have a meaning that's fully independent of the others.
Rather it may be useful precisely because it's not independent -- it
decribes a situation which is currently an overlap between the other
states.

I'm pretty sure this is more or less what the proposed patches do,
but I'll review again with this picture in mind.

Thoughts?

Cheers
---Dave


Note: Making things symmetric
-----------------------------

An alternative behaviour for syscalls would be:

                                          TIF_SVE
               !TIF_SVE    ,-------------------------.
	  ,--------------------------.

        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |            ======>         <======            |
        +---------------+               +---------------+
        |               |               |               |
        |               +---------------+               |
        |               |               |               |
        +---------------+               +---------------+

This shouldn't change the underlying behaviour other than way certain
if() conditions would be expressed.  This model may make it clearer
that the TIF_SVE versus !TIF_SVE decision doesn't occur until moving
out of the middle state.  This gives a natural place to turn SVE on
speculatively for example, when a syscall occurs with !TIF_SVE.
(Whether this is useful is we could ever make an accurate enough guess
is another matter...  but this still might make the code a bit easier
to conceptualise.)



More information about the linux-arm-kernel mailing list