[PATCH] arm64: mte: avoid TFSR related operations unless in async mode

Catalin Marinas catalin.marinas at arm.com
Fri Jul 2 10:37:17 PDT 2021


On Thu, Jul 01, 2021 at 11:11:34AM -0700, Peter Collingbourne wrote:
> On Thu, Jul 1, 2021 at 10:37 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > On Wed, Jun 30, 2021 at 08:14:48PM -0700, Peter Collingbourne wrote:
> > >       /* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
> > > @@ -151,11 +157,14 @@ alternative_else_nop_endif
> > >       .endm
> > >
> > >       /* Clear the MTE asynchronous tag check faults */
> > > -     .macro clear_mte_async_tcf
> > > +     .macro clear_mte_async_tcf thread_sctlr
> > >  #ifdef CONFIG_ARM64_MTE
> > >  alternative_if ARM64_MTE
> > > +     /* See comment in check_mte_async_tcf above. */
> > > +     tbz     \thread_sctlr, #(SCTLR_EL1_TCF0_SHIFT + 1), 1f
> > >       dsb     ish
> > >       msr_s   SYS_TFSRE0_EL1, xzr
> > > +1:
> >
> > Here, maybe, as we have a DSB.
> 
> Yes, disabling clear_mte_async_tcf offered an order of magnitude
> larger speedup than disabing check_mte_async_tcf, presumably due to
> the DSB. I would reckon though that if we're going to make some of the
> code conditional on TCF we might as well make all of it conditional in
> order to get the maximum possible benefit.

I'd like to avoid a TBZ on sctlr_user if it's not necessary. I reckon
the big CPUs would prefer async mode anyway.

> Nevertheless, isn't it the case that disabling check_mte_async_tcf for
> non-ASYNC tasks is necessary for correctness if we want to disable
> clear_mte_async_tcf? Imagine that we just disable clear_mte_async_tcf,
> and then we get a tag check failing uaccess in a TCF=ASYNC task which
> then gets preempted by a TCF=NONE task which will skip clear on kernel
> exit. If we don't disable check on kernel entry then I believe that we
> will get a false positive tag check fault in the TCF=NONE task the
> next time it enters the kernel.

You are right, only doing one side would cause potential issues.

The uaccess routines honour the SCTLR_EL1.TCF0 setting (it's been
corrected in the architecture pseudocode some months ago). If we zero
TFSRE0_EL1 in mte_tread_switch(), it should cover your case. This
shouldn't be expensive since we already have a DSB on that path. I'm not
sure it's better than your proposal but not allowing the TFSRE0_EL1
state to span multiple threads makes reasoning about it a bit easier.

If the above context switch zeroing doesn't work, we could go ahead with
your patch. But since TFSRE0_EL1 != 0 is a rare event and we expect to
run in async mode on some CPUs, we could move the TBZ on sctlr_user in
check_mte_async_tcf after the tbz for the actual TFSRE0_EL1. IOW, only
check it prior to setting the TIF flag.

BTW, I think currently on entry we can avoid zeroing TFSRE0_EL1 since we
clear it on return anyway, so one less instruction (irrespective of your
patch).

-- 
Catalin



More information about the linux-arm-kernel mailing list