[PATCH] arm64: mte: avoid TFSR related operations unless in async mode

Fri Jul 2 19:46:11 PDT 2021

On Fri, Jul 2, 2021 at 10:37 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
>
> On Thu, Jul 01, 2021 at 11:11:34AM -0700, Peter Collingbourne wrote:
> > On Thu, Jul 1, 2021 at 10:37 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > > On Wed, Jun 30, 2021 at 08:14:48PM -0700, Peter Collingbourne wrote:
> > > >       /* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
> > > > @@ -151,11 +157,14 @@ alternative_else_nop_endif
> > > >       .endm
> > > >
> > > >       /* Clear the MTE asynchronous tag check faults */
> > > > -     .macro clear_mte_async_tcf
> > > > +     .macro clear_mte_async_tcf thread_sctlr
> > > >  #ifdef CONFIG_ARM64_MTE
> > > >  alternative_if ARM64_MTE
> > > > +     /* See comment in check_mte_async_tcf above. */
> > > > +     tbz     \thread_sctlr, #(SCTLR_EL1_TCF0_SHIFT + 1), 1f
> > > >       dsb     ish
> > > >       msr_s   SYS_TFSRE0_EL1, xzr
> > > > +1:
> > >
> > > Here, maybe, as we have a DSB.
> >
> > Yes, disabling clear_mte_async_tcf offered an order of magnitude
> > larger speedup than disabing check_mte_async_tcf, presumably due to
> > the DSB. I would reckon though that if we're going to make some of the
> > code conditional on TCF we might as well make all of it conditional in
> > order to get the maximum possible benefit.
>
> I'd like to avoid a TBZ on sctlr_user if it's not necessary. I reckon
> the big CPUs would prefer async mode anyway.

This is more targeted to tasks running with TCF=NONE. In this case, we
should avoid the MTE-related overheads as much as possible.

I measured the performance impact of this change with TCF=ASYNC tasks
and was barely able to measure a difference with 30 samples at 95% CI
(and then only on one cluster). So I don't think we should worry about
the TBZ here.

> > Nevertheless, isn't it the case that disabling check_mte_async_tcf for
> > non-ASYNC tasks is necessary for correctness if we want to disable
> > clear_mte_async_tcf? Imagine that we just disable clear_mte_async_tcf,
> > and then we get a tag check failing uaccess in a TCF=ASYNC task which
> > then gets preempted by a TCF=NONE task which will skip clear on kernel
> > exit. If we don't disable check on kernel entry then I believe that we
> > will get a false positive tag check fault in the TCF=NONE task the
> > next time it enters the kernel.
>
> You are right, only doing one side would cause potential issues.
>
> The uaccess routines honour the SCTLR_EL1.TCF0 setting (it's been
> corrected in the architecture pseudocode some months ago). If we zero
> TFSRE0_EL1 in mte_tread_switch(), it should cover your case. This
> shouldn't be expensive since we already have a DSB on that path. I'm not
> sure it's better than your proposal but not allowing the TFSRE0_EL1
> state to span multiple threads makes reasoning about it a bit easier.

I think it's debatable whether it makes it easier to reason about. If
you think of the "clear" operation as "priming" the CPU to check for
tag check faults later, then a code structure where the "priming"
operation clearly always happens before the check operation (because
they use the same condition, and are placed symmetrically to one
another) would seem easier to reason about IMHO.

> If the above context switch zeroing doesn't work, we could go ahead with
> your patch. But since TFSRE0_EL1 != 0 is a rare event and we expect to
> run in async mode on some CPUs, we could move the TBZ on sctlr_user in
> check_mte_async_tcf after the tbz for the actual TFSRE0_EL1. IOW, only
> check it prior to setting the TIF flag.
>
> BTW, I think currently on entry we can avoid zeroing TFSRE0_EL1 since we
> clear it on return anyway, so one less instruction (irrespective of your
> patch).

Nice observation. I'll fold that into v2.

Peter