[PATCH v3] arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary
Peter Collingbourne
pcc at google.com
Mon Jan 24 16:59:03 PST 2022
On Mon, Jan 24, 2022 at 3:45 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
>
> On Fri, Jan 21, 2022 at 05:02:50PM -0800, Peter Collingbourne wrote:
> > diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> > index 075539f5f1c8..5352db4c0f45 100644
> > --- a/arch/arm64/include/asm/mte.h
> > +++ b/arch/arm64/include/asm/mte.h
> > @@ -11,7 +11,9 @@
> > #ifndef __ASSEMBLY__
> >
> > #include <linux/bitfield.h>
> > +#include <linux/kasan.h>
> > #include <linux/page-flags.h>
> > +#include <linux/sched.h>
> > #include <linux/types.h>
> >
> > #include <asm/pgtable-types.h>
> > @@ -86,6 +88,23 @@ static inline int mte_ptrace_copy_tags(struct task_struct *child,
> >
> > #endif /* CONFIG_ARM64_MTE */
> >
> > +static inline void mte_disable_tco_entry(struct task_struct *task)
> > +{
> > + /*
> > + * Re-enable tag checking (TCO set on exception entry). This is only
> > + * necessary if MTE is enabled in either the kernel or the userspace
> > + * task in synchronous mode. With MTE disabled in the kernel and
> > + * disabled or asynchronous in userspace, tag check faults (including in
> > + * uaccesses) are not reported, therefore there is no need to re-enable
> > + * checking. This is beneficial on microarchitectures where re-enabling
> > + * TCO is expensive.
> > + */
>
> I'd add a note here that the 1ULL << SCTLR_EL1_TCF0_SHIFT is meant to
> check for both synchronous and asymmetric modes even if we don't have
> the latter supporting the user yet. We do have the definitions already.
Done in v4.
> > + if (kasan_hw_tags_enabled() ||
> > + (system_supports_mte() &&
> > + (task->thread.sctlr_user & (1UL << SCTLR_EL1_TCF0_SHIFT))))
> > + asm volatile(SET_PSTATE_TCO(0));
> > +}
>
> Does it make a difference in code generation if you place a:
>
> if (!system_supports_mte())
> return;
>
> at the beginning of the function (and remove the subsequent check)? It's
> probably also easier to read, though the code generation depends on the
> likely/unlikely choices for the static branches involved.
Yes and this ends up making the patch cause a small speedup on the
DragonBoard: 245.1ns to 244.5ns on the small cores and 151.4ns to
148.4ns on the large cores. The numbers on the MTE-enabled hardware
don't change too much.
> > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > index f418ebc65f95..5345587f3384 100644
> > --- a/arch/arm64/kernel/mte.c
> > +++ b/arch/arm64/kernel/mte.c
> > @@ -252,6 +252,7 @@ void mte_thread_switch(struct task_struct *next)
> >
> > mte_update_sctlr_user(next);
> > mte_update_gcr_excl(next);
> > + mte_disable_tco_entry(next);
>
> Maybe a one-line comment here that TCO may not have been disabled on
> exception entry for the current task.
Done in v4.
> Otherwise it looks good to me:
>
> Reviewed-by: Catalin Marinas <catalin.marinas at arm.com>
Thanks.
Peter
More information about the linux-arm-kernel
mailing list