[PATCH v2] arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary

Peter Collingbourne pcc at google.com
Fri Jan 21 17:03:04 PST 2022


On Fri, Dec 17, 2021 at 10:17 AM Catalin Marinas
<catalin.marinas at arm.com> wrote:
>
> On Wed, Dec 15, 2021 at 06:44:03PM -0800, Peter Collingbourne wrote:
> > On Fri, Dec 10, 2021 at 4:06 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > > On Wed, Nov 10, 2021 at 02:07:35PM -0800, Peter Collingbourne wrote:
> > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > > > index 2f69ae43941d..a78ec15f5bbc 100644
> > > > --- a/arch/arm64/kernel/entry.S
> > > > +++ b/arch/arm64/kernel/entry.S
> > > > @@ -189,6 +189,27 @@ alternative_cb_end
> > > >  #endif
> > > >       .endm
> > > >
> > > > +     .macro mte_clear_tco, sctlr
> > > > +     /*
> > > > +      * Re-enable tag checking (TCO set on exception entry). This is only
> > > > +      * necessary if MTE is enabled in either the kernel or the userspace
> > > > +      * task in synchronous mode. With MTE disabled in the kernel and
> > > > +      * disabled or asynchronous in userspace, tag check faults (including in
> > > > +      * uaccesses) are not reported, therefore there is no need to re-enable
> > > > +      * checking. This is beneficial on microarchitectures where re-enabling
> > > > +      * TCO is expensive.
> > > > +      */
> > > > +#ifdef CONFIG_ARM64_MTE
> > > > +alternative_cb       kasan_hw_tags_enable
> > > > +     tbz     \sctlr, #SCTLR_EL1_TCF0_SHIFT, 1f
> > > > +alternative_cb_end
> > > > +alternative_if ARM64_MTE
> > > > +     SET_PSTATE_TCO(0)
> > > > +alternative_else_nop_endif
> > > > +1:
> > > > +#endif
> > > > +     .endm
> > >
> > > The patch looks fine to me but I recall in a private conversation with
> > > Mark he proposed the idea of moving this to entry-common.c (unless it
> > > was about something completely different). The downside is that we run
> > > with the TCO set for slightly longer. There shouldn't be a major
> > > drawback currently as we don't have stack tagging anyway.
> >
> > Yes, Mark made that suggestion on the list. I tried it and found that
> > it led to a performance regression relative to baseline [1].
> [...]
> > [1] https://lore.kernel.org/all/CAMn1gO51k1Dqts=THYq28nVMSvO6ZQB5sEG1wuzEVpAXBTfFjg@mail.gmail.com/
>
> That's weird since there should be the same number of instructions
> executed on entry with your diff above. Could it be that
> mte_disable_tco_entry() is not inlined?

Looks like it was something like that. In the kernel without the patch
enter_from_user_mode() was empty which allowed the compiler to remove
the call from its callers (el0_svc etc). In the kernel with the patch
it contained one instruction (the MSR) and it wasn't inlined,
apparently because of the noinstr annotation (which includes noinline;
I guess removing a call to an empty function doesn't count as
inlining). This was with a kernel version based on 5.10 (don't
currently have a way to run a newer kernel version on this hardware
unfortunately); looks like some refactoring patches have landed since
then that allow enter_from_user_mode to be inlined.

With a similar refactoring patch applied to the 5.10 kernel, things
look fairly reasonable with the patch that I'm sending as v3. Looking
at all combinations of {kasan off, kasan on} x {none, async, sync} x
{big, medium, little} there are some small regressions but they are
mostly on the little cores.

I also tested the patch on non-MTE hardware (DragonBoard 845c) running
the mainline kernel. This revealed that the added call to
system_supports_mte() was necessary to avoid regressing things by too
much on non-MTE hardware (avoids the load from current). With the v3
patch the syscall latency goes from 245.1ns to 247.4ns on the little
cores and 151.4ns to 145.2ns on the big cores.

Peter



More information about the linux-arm-kernel mailing list