[PATCH v2 11/28] arm64/sve: Core task context handling

Dave Martin Dave.Martin at arm.com
Tue Oct 3 04:11:01 PDT 2017


On Wed, Sep 20, 2017 at 02:58:56PM +0100, Catalin Marinas wrote:
> On Thu, Sep 14, 2017 at 08:55:56PM +0100, Dave P Martin wrote:
> > On Wed, Sep 13, 2017 at 07:33:25AM -0700, Catalin Marinas wrote:
> > > On Thu, Aug 31, 2017 at 06:00:43PM +0100, Dave P Martin wrote:
> > > > +/*
> > > > + * Handle SVE state across fork():
> > > > + *
> > > > + * dst and src must not end up with aliases of the same sve_state.
> > > > + * Because a task cannot fork except in a syscall, we can discard SVE
> > > > + * state for dst here, so long as we take care to retain the FPSIMD
> > > > + * subset of the state if SVE is in use.  Reallocation of the SVE state
> > > > + * will be deferred until dst tries to use SVE.
> > > > + */
> > > > +void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src)
> > > > +{
> > > > +	if (test_and_clear_tsk_thread_flag(dst, TIF_SVE)) {
> > > > +		WARN_ON(dst->mm && !in_syscall(task_pt_regs(dst)));
> > > > +		sve_to_fpsimd(dst);
> > > > +	}
> > > > +
> > > > +	dst->thread.sve_state = NULL;
> > > > +}
> > > 
> > > I first thought the thread flags are not visible in dst yet since
> > > dup_task_struct() calls arch_dup_task_struct() before
> > > setup_thread_stack(). However, at the end of the last year we enabled
> > > CONFIG_THREAD_INFO_IN_TASK_STRUCT. But I don't particularly like relying
> > > on this.
> > 
> > Hmmm, I see your point, but there are some sequencing issues here.
> > 
> > > Anyway, IIUC we don't need sve_to_fpsimd() here. The
> > > arch_dup_task_struct() already called fpsimd_preserve_current_state()
> > 
> > I consider SVE discard as an optional side effect of task_fpsimd_save(),
> > not something that is guaranteed to happen -- the decision about whether
> > to do so may become more intelligent later on.  So, for src, we may
> > discard SVE (because syscall), but for dst we must NULL .sve_state (and
> > therefore clear TIF_SVE) simply to avoid aliasing of src->sve_state and
> > dst->sve_state.
> 
> My point was that the SVE state of src is already preserved at this
> point and copied into dst. You don't need the sve_to_fpsimd(dst) again
> which basically does the same copying of the src SVE saved state into
> the FPSIMD one in dst. This has already been done in
> arch_dup_task_struct() by the combination of
> fpsimd_preserve_current_state() and *dst = *src (and, of course,
> clearing TIF_SVE in dst).
> 
> I don't think the TIF_SVE clearing in src is just a side effect of
> task_fpsimd_save() here but rather a requirement. When returning from
> fork(), both src and dst would need to have the same state. However,
> your fpsimd_dup_sve() implementation makes it very clear that the SVE
> state is lost in dst. This is only allowed if we also lose it in src (as
> a result of a syscall). So making dst->sve_state = NULL requires that
> TIF_SVE is also cleared in both src and dst. Alternatively, you'd have
> to allocate a new state here and copy the full src SVE state across to
> dst, together with setting TIF_SVE (that's not necessary, however, since
> we get here as a result of a syscall).

The currently intended ABI is that the SVE bits are unspecified after a
syscall, so it is legitimate (though perhaps surprising) for different
things to happen in dst and src.

This complicates things a lot though, just to avoid the next SVE usage
exception in src after the fork.


It should be simpler to do what you suggest and discard the SVE state of
src unconditionally before the copy: then we really are just cloning the
thread apart from the need to set dst->thread.sve_state to NULL.

fpsimd_preserve_current_state() does not necessarily write back to
current->thread.fpsmid_state though: at the moment, it does do this as a
side effect of task_fpsimd_save() because we happen to be in a syscall
(i.e., fork).  

What we really want is unconditional discarding of the state.  This
wasn't needed anywhere else yet, so there's no explicit helper for it.
But it makes sense to add one.

What about refactoring along these lines:


fpsimd.c:
/* Unconditionally discard the SVE state */
void task_sve_discard(struct task_struct *task)
{
	if (!system_supports_sve())
		return;

	local_bh_disable();
	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
		sve_to_fpsimd(task);
	local_bh_enable();
}

process.c:
int arch_dup_task_struct(sturct task_struct *dst, struct task_struct *src)
{
	if (current->mm) {
		fpsimd_preserve_current_state();
		task_sve_discard(src);
	}

	*dst = *src;

	dst->thread.sve_state = NULL;
}


This also avoids having to touch dst's thread flags, since now we
are just cloning the task except for assigning NULL to
dst->thread.sve_state.


> > > for src, so the FPSIMD state (which we care about) is transferred during
> > > the *dst = *src assignment. So you'd only need the last statement,
> > > possibly with a different function name like fpsimd_erase_sve (and maybe
> > > make the function static inline in the header).
> > 
> > Not quite: TIF_SVE must be cleared so that a context switch or
> > kernel_neon_begin() after dst is scheduled doesn't try to save state in
> > the (NULL) dst->sve_state.
> 
> Yes, TIF_SVE must also be cleared in dst when dst->sve_state = NULL (I
> may have forgotten to mention this).

With the above factoring, the constraint "TIF_SVE implies sve_state
valid" is then never false, because TIF_SVE is already cleared before
the NULL assignment.

What do you think?

Cheers
---Dave



More information about the linux-arm-kernel mailing list