[PATCH v2 11/28] arm64/sve: Core task context handling
Catalin Marinas
catalin.marinas at arm.com
Wed Oct 4 10:29:56 PDT 2017
On Tue, Oct 03, 2017 at 12:11:01PM +0100, Dave P Martin wrote:
> On Wed, Sep 20, 2017 at 02:58:56PM +0100, Catalin Marinas wrote:
> > On Thu, Sep 14, 2017 at 08:55:56PM +0100, Dave P Martin wrote:
> > > On Wed, Sep 13, 2017 at 07:33:25AM -0700, Catalin Marinas wrote:
> > > > On Thu, Aug 31, 2017 at 06:00:43PM +0100, Dave P Martin wrote:
> > > > > +/*
> > > > > + * Handle SVE state across fork():
> > > > > + *
> > > > > + * dst and src must not end up with aliases of the same sve_state.
> > > > > + * Because a task cannot fork except in a syscall, we can discard SVE
> > > > > + * state for dst here, so long as we take care to retain the FPSIMD
> > > > > + * subset of the state if SVE is in use. Reallocation of the SVE state
> > > > > + * will be deferred until dst tries to use SVE.
> > > > > + */
> > > > > +void fpsimd_dup_sve(struct task_struct *dst, struct task_struct const *src)
> > > > > +{
> > > > > + if (test_and_clear_tsk_thread_flag(dst, TIF_SVE)) {
> > > > > + WARN_ON(dst->mm && !in_syscall(task_pt_regs(dst)));
> > > > > + sve_to_fpsimd(dst);
> > > > > + }
> > > > > +
> > > > > + dst->thread.sve_state = NULL;
> > > > > +}
> > > >
> > > > I first thought the thread flags are not visible in dst yet since
> > > > dup_task_struct() calls arch_dup_task_struct() before
> > > > setup_thread_stack(). However, at the end of the last year we enabled
> > > > CONFIG_THREAD_INFO_IN_TASK_STRUCT. But I don't particularly like relying
> > > > on this.
> > >
> > > Hmmm, I see your point, but there are some sequencing issues here.
> > >
> > > > Anyway, IIUC we don't need sve_to_fpsimd() here. The
> > > > arch_dup_task_struct() already called fpsimd_preserve_current_state()
> > >
> > > I consider SVE discard as an optional side effect of task_fpsimd_save(),
> > > not something that is guaranteed to happen -- the decision about whether
> > > to do so may become more intelligent later on. So, for src, we may
> > > discard SVE (because syscall), but for dst we must NULL .sve_state (and
> > > therefore clear TIF_SVE) simply to avoid aliasing of src->sve_state and
> > > dst->sve_state.
> >
> > My point was that the SVE state of src is already preserved at this
> > point and copied into dst. You don't need the sve_to_fpsimd(dst) again
> > which basically does the same copying of the src SVE saved state into
> > the FPSIMD one in dst. This has already been done in
> > arch_dup_task_struct() by the combination of
> > fpsimd_preserve_current_state() and *dst = *src (and, of course,
> > clearing TIF_SVE in dst).
> >
> > I don't think the TIF_SVE clearing in src is just a side effect of
> > task_fpsimd_save() here but rather a requirement. When returning from
> > fork(), both src and dst would need to have the same state. However,
> > your fpsimd_dup_sve() implementation makes it very clear that the SVE
> > state is lost in dst. This is only allowed if we also lose it in src (as
> > a result of a syscall). So making dst->sve_state = NULL requires that
> > TIF_SVE is also cleared in both src and dst. Alternatively, you'd have
> > to allocate a new state here and copy the full src SVE state across to
> > dst, together with setting TIF_SVE (that's not necessary, however, since
> > we get here as a result of a syscall).
>
> The currently intended ABI is that the SVE bits are unspecified after a
> syscall, so it is legitimate (though perhaps surprising) for different
> things to happen in dst and src.
>
> This complicates things a lot though, just to avoid the next SVE usage
> exception in src after the fork.
>
>
> It should be simpler to do what you suggest and discard the SVE state of
> src unconditionally before the copy: then we really are just cloning the
> thread apart from the need to set dst->thread.sve_state to NULL.
>
> fpsimd_preserve_current_state() does not necessarily write back to
> current->thread.fpsmid_state though: at the moment, it does do this as a
> side effect of task_fpsimd_save() because we happen to be in a syscall
> (i.e., fork).
>
> What we really want is unconditional discarding of the state. This
> wasn't needed anywhere else yet, so there's no explicit helper for it.
> But it makes sense to add one.
>
> What about refactoring along these lines:
>
>
> fpsimd.c:
> /* Unconditionally discard the SVE state */
> void task_sve_discard(struct task_struct *task)
> {
> if (!system_supports_sve())
> return;
>
> local_bh_disable();
> if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> sve_to_fpsimd(task);
> local_bh_enable();
> }
>
> process.c:
> int arch_dup_task_struct(sturct task_struct *dst, struct task_struct *src)
> {
> if (current->mm) {
> fpsimd_preserve_current_state();
> task_sve_discard(src);
> }
>
> *dst = *src;
>
> dst->thread.sve_state = NULL;
> }
>
>
> This also avoids having to touch dst's thread flags, since now we
> are just cloning the task except for assigning NULL to
> dst->thread.sve_state.
This looks fine to me, the execution of task_sve_discard() is nearly a
no-op with the current code.
We still have some local_bh_disable/enable() calls, though I don't think
it's worth optimising them now (e.g. having a
fpsimd_preserve_current_state_and_discard_sve() function with a
"sve_discard" argument to task_fpsimd_save() to force this).
--
Catalin
More information about the linux-arm-kernel
mailing list