[PATCH v7 1/2] arm64/sve: Split TIF_SVE into separate execute and register state flags

Tue Feb 9 17:16:27 EST 2021

On Tue, Feb 09, 2021 at 05:59:46PM +0000, Dave Martin wrote:

> I'm wondering if TIF_SVE_FULL_REGS is actually conflating two things
> here:
>  a) whether the SVE regs can be discarded
>  b) how the FPSIMD/SVE regs are encoded in thread_struct.

The way I was thinking about this was that at any point where we can
discard the SVE register state we should actually do so - I wasn't
considering a state where we could potentially discard them but didn't
actually want to do so.

> When implementing a trivial policy for discarding the SVE regs, we
> discard the regs at the earliest opportunity, so (a) and (b) are not
> very distinct.  But if we try to be cleverer later on then this would
> break down.  If we separate the two meanings from the outset, would
> it help to steer any future policy stuff in a more maintainable
> direction.

> This might mean having two flags instead of one.

I think if we want to track potential actions here then rather than
tracking a further "potentially discardable" state we should be tracking
the things that let us decide to discard the state such as being in a
syscall, it seems likely that things that allow us to have such a state
could also have similar implications for other areas of the ABI at some
point and that we'd be able to get more usage out of that tracking than
something which only applies to SVE.

> >  static void __sve_free(struct task_struct *task)
> >  {
> > +	/* SVE context will be zeroed when allocated. */
> > +	clear_tsk_thread_flag(task, TIF_SVE_FULL_REGS);

> Can you elaborate on why this is here?

> I'm not sure this is the right place to clear flags.  This code is
> really just for freeing the buffer, along with whatever checks are
> necessary to confirm that this is safe / sane to do. (i.e.,
> TIF_SVE(_EXEC) set.).

It felt like it lined up with the assignment of sve_state to NULL
immediately below.  It could equally go a level up in sve_free() I
think, the logic in splitting the two free functions isn't super clear
from the code TBH - it seems to be due to fpsimd_release_task() and
wanting to avoid the state check on SVE_EXEC in the existing code which
I guess people thought was valuable.

> > + *  * TIF_SVE_EXEC set:
> > + *
> > + *    The task can execute SVE instructions while in userspace without
> > + *    trapping to the kernel.  Storage of Z0-Z31 (incorporating Vn in
> > + *    bits[0-127]) is determined by TIF_SVE_FULL_REGS.
> >   *
> >   *    task->thread.sve_state must point to a valid buffer at least
> >   *    sve_state_size(task) bytes in size.
> >   *
> > - *    During any syscall, the kernel may optionally clear TIF_SVE and
> > - *    discard the vector state except for the FPSIMD subset.
> > - *
> > - *  * TIF_SVE clear:
> > + *    During any syscall the ABI allows the kernel to discard the
> > + *    vector state other than the FPSIMD subset.  When this is done
> > + *    both TIF_SVE_EXEC and TIF_SVE_FULL_REGS will be cleared.

> Can this occur if TIF_SVE_FULL_REGS is initially set?  I thought
> !TIF_SVE_FULL_REGS our way of telling ourselves that we can discard the
> SVE state at all...

I'm not sure what you mean here - what exactly is the "this" you're
referring to, and what do you mean by "initially set"?  The intent of
the above is to document the fact that when we do a syscall we both
disable SVE execution (so an access trap is generated) and discard the
SVE state.

As I said in the other mail and the earlier part of this comment is
intended to say the intent of !TIF_SVE_FULL_REGS is that the state for
the SVE registers (if required) should be taken from the FPSIMD
registers, wherever they are stored at the time the SVE state is needed.

This documentation is attempting to cover the intended behaviour after
further changes to allow the two flags to be set separately, it didn't
seem sensible to try to document the behaviour when they're set in lock
step since otherwise there will be questions about what the flags are
supposed to mean separately when they're being introduced.

> 
> > + *  * TIF_SVE_FULL_REGS is not set:
> >   *
> >   *    When stored, FPSIMD registers V0-V31 are encoded in
> >   *    task->thread.uw.fpsimd_state; bits [max : 128] for each of Z0-Z31 are
> > @@ -265,12 +287,38 @@ static void sve_free(struct task_struct *task)
> >   *    view.  For hygiene purposes, the kernel zeroes them on next use,
> >   *    but userspace is discouraged from relying on this.
> >   *
> > - *    task->thread.sve_state does not need to be non-NULL, valid or any
> > - *    particular size: it must not be dereferenced.
> > + *    On entry to the kernel other than from a syscall the kernel must
> > + *    set TIF_SVE_FULL_REGS and save the full register state.

> Only if TIF_SVE_EXEC is set though?

Yes, I'll clarify this.

> I had thought that it is more natural to set TIF_SVE_FULL_REGS in
> preparation for entering userspace.  This does mean that if we are
> preempted between fpsimd_restore_current_state() and exception return to
> userspace, then we would have to save the full regs unnecessarily.  This
> can only happen during a small window though, so it should be rare very
> unless the system is already thrashing.  This is the situation prior to
> your series IIUC.

We need to set TIF_SVE_FULL_REGS on entry to the kernel so that if we
come to save the register state we know we need to save the full SVE
state (which userspace may have changed) and so that if we return to
userspace without saving we know if we need to zero the bits of SVE
state that are not shared with the FPSIMD state.  It is true that there
is a race after doing a conversion from FPSIMD in the exit path where we
might end up saving the full state again, as you say that's not changed
here and is pretty thin.  

> Alternatively, we set TIF_SVE_FULL_REGS when entering the kernel on a
> non-syscall path with TIF_SVE_EXEC set - but that adds a small overhead
> to every kernel entry and feels slightly messier.

The goal is to avoid that overhead in practice by ensuring that we never
leave the kernel without having both flags set, changing the check on
entry to the inverse condition.  I'll explicitly call that out in the
comment.

> I wonder if it would help to describe TIF_SVE_FULL_REGS orthogonally to
> TIF_SVE_EXEC.

> Although we don't intend to implement the combination !TIF_SVE_EXEC &&
> TIF_SVE_FULL_REGS, it does have a logically consistent meaning (SVE
> disabled for userspace, but Vn nonetheless stored in sve_state, laid out
> according to thread->sve_vl).

My worry with doing that was that as it's not a meaningful state it
would create confusion regarding how we could get into that situation
and what it would actually mean.

> When flipping the flags, there may be a hazard where this combination
> temporarily appears, but we could hide that in a helper that runs under
> get_cpu_fpsimd_context() -- just like we already do for various other
> things.

> (We could avoid this by adding atomic thread_flags manipulators that
> flip multiple flags, but that's overkill just for this.)

How about explicitly saying it's not meaningful and should never be
allowed to happen?  To me it's just undefined behaviour and anything
that is expedient is fine.

> > -	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> > -		sve_load_state(sve_pffr(&current->thread),
> > -			       &current->thread.uw.fpsimd_state.fpsr,
> > -			       sve_vq_from_vl(current->thread.sve_vl) - 1);
> > -	else
> > -		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> > +	if (test_thread_flag(TIF_SVE_EXEC)) {

> Was system_supports_sve() dropped on purpose?  And why?

There were concerns on earlier versions of the series about the
system_supports_sve() checks being redundant, mainly raised by new
checks being added in this pattern - if we have TIF_SVE and no SVE
hardware we've got a pretty bad bug so I was cutting them out.  I had
started going through and doing that but it looks like I got distracted
somewhere before actually finishing it and noting it in the revision
list.

> > +		if (test_and_set_thread_flag(TIF_SVE_FULL_REGS))
> > +			sve_load_state(sve_pffr(&current->thread),
> > +				       &current->thread.uw.fpsimd_state.fpsr,
> > +				       vl);
> > +		else
> > +			sve_load_from_fpsimd_state(&current->thread.uw.fpsimd_state,
> > +						   vl);

> As observed above, there could be an argument for setting
> TIF_SVE_FULL_REGS here, since the only reason to actually load the regs
> is in preparation for entering userspace, which can freely dirty all the
> SVE bits undetected since trapping for EL0 will be disabled due to
> TIF_SVE_EXEC being est.

Yes, we could do that - I guess it's OK to squash into this change.

> > +	fpsimd_load_state(&current->thread.uw.fpsimd_state);

> Maybe this could be more readable, as well as reducing indentation on
> the more complex branch of the if, as:

> 	if (!system_supports_sve || !test_thread_flag(TIF_SVE_EXEC)) {
> 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> 
> 		return;
> 	}

> 	/* handle TIF_SVE_EXEC case */

Sure.

> > @@ -318,6 +385,7 @@ static void fpsimd_save(void)
> >  				return;
> >  			}
> >  
> > +			set_thread_flag(TIF_SVE_FULL_REGS);

> Why do this here?  What if we're in a syscall?

There's no problem if we're in a syscall since we'll have already
cleared TIF_SVE_EXEC but yes, we shouldn't do this here - I think it's
just left over from my initial pass through where I was trying to
explictly handle both flags at once which just ended up being too
tortuous.

> > @@ -627,8 +695,9 @@ int sve_set_vector_length(struct task_struct *task,
> >  	}
> >  
> >  	fpsimd_flush_task_state(task);
> > -	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> > +	if (test_and_clear_tsk_thread_flag(task, TIF_SVE_FULL_REGS))
> >  		sve_to_fpsimd(task);

> Won't this mean that we dereference thread->sve_state even if
> TIF_SVE_EXEC was clear?

Yes, potentially if we let the flag be set.  I'll add an extra check
here and in other similar places you mentioned.

> For the ptrace case, I think we can probably can hit this, IIUC.

> It might not apply to the prctl() case if TIF_SVE_EXEC was already
> cleared during syscall entry (?), but as observed above this still
> assumes that the SVE regs are discarded at the earliest opportinuty,
> which might not be true in future.

I would hope that if the SVE registers become actually invalid that'd
also involve clearing TIF_SVE_FULL_REGS but yeah.

> > @@ -952,8 +1022,9 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
> >  	fpsimd_flush_task_state(current);
> >  
> >  	fpsimd_to_sve(current);

> Hmmm, there's a latent bug upstream here: if the WARN() fires then
> sve_state is not safe to dereference.  But we already did.

> So perhaps this should have been something like:

> 	if (!WARN_ON(test_and_set_thread_flag(TIF_SVE)))
> 		fpsimd_to_sve();

> This might make sense as a separate Fixes patch to precede the series.

Yes, that's definitely a separate fix I think.

> > @@ -1181,7 +1261,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
> >  	get_cpu_fpsimd_context();
> >  
> >  	current->thread.uw.fpsimd_state = *state;
> > -	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> > +	if (test_thread_flag(TIF_SVE_FULL_REGS))

> Unintentionally dropped system_supports_sve()?

As above discussion on previous revisions suggested removing these - I
actually spotted a few more that I'd missed here.

> > @@ -241,7 +242,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
> >  	BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
> >  	err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
> >  
> > -	if (vq) {
> > +	if (vq && test_thread_flag(TIF_SVE_FULL_REGS)) {
> 
> Hmm, in theory we should be able to mark the regs as discardable once
> they have been saved in the frame, though we never did that in the
> past.  Since we never zeroed the extra bits on signal handler entry
> anyway, we could actually skip the zeroing for this specific case.  Any
> change of this sort should go in a separate patch though.

Definitely a separate change, and probably sensible to split it from
this series.

> > @@ -587,7 +590,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
> >  	if (system_supports_sve()) {
> >  		unsigned int vq = 0;
> >  
> > -		if (add_all || test_thread_flag(TIF_SVE)) {
> > +		if (add_all || test_thread_flag(TIF_SVE_EXEC)) {
> 
> Doesn't this need to match the SVE reg dumping condition in
> preserve_sve_context()?

Posssibly, yeah.

> > --- a/arch/arm64/kernel/syscall.c
> > +++ b/arch/arm64/kernel/syscall.c
> > @@ -186,7 +186,8 @@ static inline void sve_user_discard(void)
> >  	if (!system_supports_sve())
> >  		return;

> > -	clear_thread_flag(TIF_SVE);
> > +	clear_thread_flag(TIF_SVE_EXEC);
> > +	clear_thread_flag(TIF_SVE_FULL_REGS);

> (Assuming this will be refined in the next patch.)

No - could you elaborate on what refinement you were expecting here?

> > -	if (test_thread_flag(TIF_SVE))
> > +	if (test_thread_flag(TIF_SVE_EXEC))
> >  		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;

> Here we're setting TIF_SVE(_EXEC) to describe the guest context, while
> stashing off the host's TIF_SVE(_EXEC) flag so we can get it back later.

> Don't we need to do a similar things for TIF_SVE_FULL_REGS though?

Yes, I think this needs redoing...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20210209/c3ab4b40/attachment.sig>