[PATCH v2] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Mon Jun 14 10:37:48 PDT 2021

On Fri, Jun 11, 2021 at 02:50:03PM -0700, Peter Collingbourne wrote:
> On Fri, Jun 11, 2021 at 7:49 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > On Wed, Jun 09, 2021 at 07:12:29PM -0700, Peter Collingbourne wrote:
> > > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > > index 9df3feeee890..545ef900e7ce 100644
> > > --- a/arch/arm64/include/asm/processor.h
> > > +++ b/arch/arm64/include/asm/processor.h
> > > @@ -159,6 +159,7 @@ struct thread_struct {
> > >  #define SCTLR_USER_MASK                                                        \
> > >       (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | SCTLR_ELx_ENDA | SCTLR_ELx_ENDB |   \
> > >        SCTLR_EL1_TCF0_MASK)
> > > +#define SCTLR_USER_DYNAMIC_TCF (1ULL << 63)
> >
> > Even if you called it "USER", it still gives the impression that it's
> > some real hardware bit. Who knows, in a few years time it may be
> > allocated to a real feature.
> 
> If bit 63 ends up being allocated to a bit that we want to allow
> userspace control over then we can always just move this to another
> bit. There are plenty to choose from that I don't think we will ever
> allow user control over, e.g. EIS.
> 
> > I also don't think this logic should be added to processor.[ch], just
> > keep it within mte.c.
> >
> > So while it's convenient to add something to this field, given that it's
> > shared with ptrauth, it's pretty fragile long term. I'd add the
> > information about the dynamic mode to a different field. We could rename
> > gcr_user_excl to mte_ctrl or something and store a few bits in there in
> > addition to GCR_EL1.Excl (with corresponding masks etc.)
> 
> I do take your point that it's somewhat awkward to commingle the SCTLR
> bits and the dynamic TCF setting here, but I'm not sure that it's
> overall better to move the bits to a renamed gcr_user_excl field. The
> consequence would be that we need two copies of the TCF setting in
> thread_struct and they will need to be kept in sync and leads to an
> implicit ordering dependency between the code dealing with the two
> fields on context switch.

I haven't checked v3 yet but I don't understand what the ordering
problem is. gcr_user_excl is also part of thread_struct and it shouldn't
change while the thread is in the middle of a context switch.

> We can make this more maintainable by adding a static_assert that
> SCTLR_USER_DYNAMIC_TCF doesn't overlap with any of the bits in
> SCTLR_USER_MASK, as I've done in v3.
> 
> Let me know what you think and if you still disagree then I can try to
> make this look more like you suggested.

I just don't like adding software bits to the sctlr field. Who knows, we
may need to add some more for MTE, maybe other features would do
something similar and it's not maintainable.

> > > +static ssize_t mte_upgrade_async_store(struct device *dev,
> > > +                                    struct device_attribute *attr,
> > > +                                    const char *buf, size_t count)
> > > +{
> > > +     ssize_t ret;
> > > +     u32 val;
> > > +     u64 tcf;
> > > +
> > > +     ret = kstrtou32(buf, 0, &val);
> > > +     if (ret < 0)
> > > +             return ret;
> > > +
> > > +     tcf = ((u64)val) << SCTLR_EL1_TCF0_SHIFT;
> > > +     if (tcf != SCTLR_EL1_TCF0_NONE && tcf != SCTLR_EL1_TCF0_SYNC &&
> > > +         tcf != SCTLR_EL1_TCF0_ASYNC)
> > > +             return -EINVAL;
> > > +
> > > +     device_lock(dev);
> > > +     per_cpu(mte_upgrade_async, dev->id) = tcf;
> > > +
> > > +     ret = stop_machine(sync_sctlr, 0, cpumask_of(dev->id));
> >
> > Do we really need a stop_machine() here? That's really heavy and we
> > don't need such synchronisation. An smp_call_function() should do or
> > just leave it until the next context switch on the corresponding CPUs.
> 
> Looks like smp_call_function_single() should work here; done in v3.

Why "single"? Don't you want this executed on all CPUs?

-- 
Catalin