[PATCH v5 1/2] arm64/sve: Don't disable SVE on syscalls return

Thu Nov 19 13:27:50 EST 2020

On Wed, Nov 18, 2020 at 05:52:01PM +0000, Mark Brown wrote:
> On Wed, Nov 18, 2020 at 12:51:56PM +0000, Dave Martin wrote:
> > On Tue, Nov 17, 2020 at 11:03:38PM +0000, Mark Brown wrote:
> > > On Tue, Nov 17, 2020 at 06:16:20PM +0000, Dave Martin wrote:
> 
> > > > I think where we get in a tangle is that TIF_SVE conflates mechanism
> > > > (whether the kernel tracks the full SVE state) with policy (whether
> > > > userspace cares about the full SVE state).  At present, we can't
> > > > describe the situation where we track SVE state but userspace doesn't
> > > > care, so we either have to do nothing at all and just leave SVE on all
> > > > the time, or we have to disable SVE at the first
> > > > opportunity.
> 
> > > I think this is the current conflation that is causing issues but I
> > > wouldn't state it quite like that, I don't know that *care* is quite the
> > > right word for what userspace is doing here.  There's two pieces,
> > > there's the SVE state in the registers and there's the ability to
> > > execute SVE instructions without trapping.  When userspace executes a
> > > SVE instruction we enable execution and also start tracking the register
> > > state which we currently track in the single TIF_SVE flag.  When
> > > userspace does a syscall the extra register state becomes undefined
> > > (realistically we'll probably have to continue resetting it to zero as
> 
> > With regard to "don't care", I used this wording what userspace is doing
> > is saying "I don't need what's in the SVE regs, feel free to throw it
> > away if it helps."  Do you see anything wrong with this desciption?
> > Maybe there's a subtlety here I've forgotten.  "Don't care" seemed a
> > good fit for this concept, but I'm not attached to this form or words
> > per se.
> 
> The issue I see is that it's not just the register state, it's also the
> ability to execute without trapping - the ABI tells userspace it can't
> use the register state any more, it's a bit of a jump to imply that a
> syscall means that userspace is no longer interested in SVE.  "Don't
> care" implies intent to no longer use it (at least for a while) which is
> a bit stronger.

It could mean that, but it doesn't have to.  Anyway, I think we're in
agreement here other than the precise meaning of "don't care" (which you
weren't using anyway...)

I have never thought that a syscall should imply "I'm not interested in
using SVE instructions for a while".  Even if the current implementation
is better suited to that scenario, it's not purposely so.

> > > we currently do) and we currently don't the ability to track the ability
> > > to execute the instructions separately to discarding that state so we
> > > just disable both at once.  This means that the syscall overhead is
> > > amplified for any task that mixes SVE and syscalls.
> 
> > Sure, though I think the difference between your model and mine is just
> > semantics.  If enter userspace with SVE untrapped, then we have no
> > choice but to track the full SVE regs from that point, so the trap
> > control and state tracking are still not really independent concepts.
> 
> Right, none of this stuff is *exactly* independent of each other.

Ack

> > If you prefer to think of TIF_SVE as purely a trap control and define
> > another flag for "has full SVE state" then I don't have a problem with
> > that; there is a choice of ways to slice up the multiple meanings
> > TIF_SVE currently has.
> 
> > In these patches TIF_SVE_NEEDS_FLUSH means "SVE untrapped, but don't
> > track full SVE state", which is arguably confusing when we have TIF_SVE
> > clear (suggesting trapping).
> 
> The way I was thinking about it both SVE flags mean "we have SVE state"
> and it's just a question of where.  As you say there's more than one way
> to do it.

Sure.

> > > While we can't be in userspace in the middle column I'm not sure it's
> > > entirely true to say that we can't usefully represent things here in
> > > memory if we think about the execute part of things as well.  If we
> > > schedule another task we can still track if we want to have SVE
> > > instruction execution on return to userspace even if the only data we're
> > > storing is the FPSIMD subset, avoiding a second SVE trap on the first
> > > SVE instruction after a syscall while also storing less data in memory.
> 
> > Sure, we could have an additional state here.  I'm not convinced it's
> > useful, or not expressible in another way, but if it keeps the middle
> > column more orthogonal from TIF_FOREIGN_FPSTATE and so makes the code
> > simpler to understand, then that could be a good thing.  We'd need to
> > see what it looks like, I guess.
> 
> Looking at the places where we set it I think it'd at least be a lot
> simpler to defer processing a transition until we interpret the flag at
> which point we're not really gaining anything by saying the case with
> invalidated SVE regs never gets represented.  I also very much like the
> current property where TIF_FOREIGN_FPSTATE just gets set when we do
> something with the state without needing to think what's there, it just
> flags if we need to reload the state from memory or not which is very
> helpful for reasoning about things both in terms of when we set the flag
> and how we handle it.

I agree.  TIF_FOREIGN_FPSTATE probably has the most clearly defined
meaning, so it's useful to keep it as-is rather than confuse things.

> > > > And finally, on sched-out (or any other fpsimd_save() event):
> > > 
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |  :            +---------------+            :  |
> > > >         |  :            |               |            :  |
> > > >         +--:------------+               +------------:--+
> > > >         |  :         <=======  [*]  =======>         :  |
> > > >         |  V            +---------------+            V  |
> > > >         |               |               |               |
> > > >         +---------------+               +---------------+
> > > 
> > > > ...at least in threory.
> 
> > > I agree up to this final schedule case since here we'd either be taking
> > > a decision to force userspace to trap on the next SVE instruction or to
> > > store and reload full SVE state in order to avoid that neither of which
> > > seems ideal.  With the current patch if TIF_SVE_NEEDS_FLUSH is set we
> > > only save the FPSIMD state and then restore it and switch to TIF_SVE
> > > when scheduling the task again - this is masked a bit in the patch since
> > > it does not update the existing code in fpsimd_save() as TIF_SVE is not
> > > set when setting TIF_SVE_NEEDS_FLUSH so there's no change in that path.
> 
> > I think I missed this point.  It doesn't sound quite right to me: how
> > will we ever turn SVE persistently off for a task?
> 
> As mentioned in the cover letter (which I inherited from Julien, it's
> been this way since v1) we don't currently, the proposal mentioned in
> the cover letter is to turn it off after some number of syscalls.

I think there are two levels of this:

1) Basically the same behaviour as today, but optimising the fast path
of a non-scheduling syscall to avoid dumping and reloading the regs,
while not making other things worse (i.e., we still want SVE to get
turned off when appropriate).

2) Introducing some logic to try to make an educated guess about whether
SVE should be on or off.

It wasn't clear that (2) would really be any better in practice than
static logic -- after all, other arches have adopted such a thing and
then subsequently dumped it -- but I don't have a strong argument
against it.

I guess I'd prefer to see this arbitrary (if straightforward) policy as
a second patch, separate from the shovelwork in (1).

> 
> > I had thought that the purpose of this patch was to improve the low-
> > overhead paths -- so with no context switch we can spin between the
> > kernel and userspace without paying the code of extra traps of dumping/
> > reloading the regs as we do now.  Once we enter a high-overhead path, we
> > might as well be brutal.
> 
> That's not something that was called out by the cover letter and it's
> not obvious to me that we want to actively seek to do that, though I can
> see the argument for it.  I think I was approaching it from the point of
> view that it seemed to simplify things to record the state at entry and
> implement whatever new state we wanted on observation without having to
> think about it in the meantime.

Right.  I think I probably got certain ideas in my head based on
previous conversations with Julien -- I considered the patches to be a
work in progress, so I was thinking of the direction it might go in
rather than the actually implemented behaviour.

> 
> > If we save the full SVE regs, and defer the decision on disabling SVE
> > until the next sched-in then that's probably reasonable: we could leave
> > SVE enabled if the regs turn out not to need reloading (as when resuming
> > the task after temporarily running some kernel thread -- we reach
> > do_notify_resume() with TIF_FOREIGN_FPSTATE clear).  If so, the bottom-
> > middle box does seem to describe the state of that task while scheduled
> > out.
> 
> Yeah.
> 
> > My own view is that we should disable SVE on sched-out (or possibly at
> > fpsimd_restore_current_state()) rather than doing anything more clever
> > just yet -- because these options remain close to the current behaviour
> > while still addressing the excessive trapping issue that prompted this
> > change.
> 
> I think restore is likely to be a better candidate than sched-out if
> we're going that way.

Ack

> > > (the first column being either no SVE flags set or only SVE_REGS set.)
> > > All entries should go to one of the !FFP cases.  Entry from SVE traps or
> > > syscalls with SVE_EXEC set should go to SVE_EXEC, other entries with
> > > SVE_EXEC already set should go to SVE_REGS+SVE_EXEC and all other
> > > entries should go to !SVE/SVE_REGS.
> > > 
> > > Currently TIF_SVE maps onto SVE_REGS+SVE_EXEC and TIF_SVE_NO_FLUSH maps
> > > onto SVE_EXEC.
> 
> > Agreed.  I don't see a big problem with this, and it removes the
> > weirdness where we turn off the flag that appears to mean ("SVE
> > enabled") when in the middle column.
> 
> > I still think it's a bit of a moot point, since we still have to resolve
> > the middle states to a different state before we can safely enter
> > userspace -- but that's not so different from the TIF_FOREIGN_FPSTATE
> > case.
> 
> Yes, my feeling was that it lined up nicely with the TIF_FOREIGN_FPSTATE
> handling since it all comes out as an overall "implement whatever we
> ended up with" step and as a result minimizes the chances that we'll end
> up doing anything expensive we didn't need to unintentionally down the
> line - it seems more likely to end up being helpful going forwards.

Seems reasonable.

> 
> > > > An alternative behaviour for syscalls would be:
> 
> > > >                                           TIF_SVE
> > > >                !TIF_SVE    ,-------------------------.
> > > > 	  ,--------------------------.
> > > > 
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |               +---------------+               |
> > > >         |            ======>         <======            |
> > > >         +---------------+               +---------------+
> > > >         |               |               |               |
> > > >         |               +---------------+               |
> > > >         |               |               |               |
> > > >         +---------------+               +---------------+
> 
> > > I worry that this might make it harder to follow in that you'd have to
> > > understand why we're considering enabling SVE for tasks that never tried
> > > to do anything with it.
> 
> > The decision on where to go from the middle box can be anything we like.
> > In the actual implementation, the obvious thing to do is to always go
> > back to the left-hand box if TIF_SVE is clear.
> 
> > The middle box just indicates the _potential_ for a decision.  But we
> > don't always have to use that opportunity.
> 
> ...
> 
> > Of couse, since this flexibility is probably not all that useful in
> > practice, it doesn't necessarily have to be expressed in the code.  The
> > picture is more of an abstraction.
> 
> Right, it was the idea of representing it in code more than imagining
> that one could do it that was concerning me.

Fair enough.

I just wanted to make sure we were solving the right problem :)

Cheers
---Dave