[PATCH v4] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Catalin Marinas catalin.marinas at arm.com
Tue Jun 15 11:02:25 PDT 2021


On Mon, Jun 14, 2021 at 06:57:28PM -0700, Peter Collingbourne wrote:
> @@ -120,6 +120,25 @@ in the ``PR_MTE_TAG_MASK`` bit-field.
>  interface provides an include mask. An include mask of ``0`` (exclusion
>  mask ``0xffff``) results in the CPU always generating tag ``0``.
>  
> +Upgrading to stricter tag checking modes
> +----------------------------------------
> +
> +On some CPUs the performance of MTE in stricter tag checking modes
> +is similar to that of less strict tag checking modes. This makes it
> +worthwhile to enable stricter checks on those CPUs when a less strict
> +checking mode is requested, in order to gain the error detection
> +benefits of the stricter checks without the performance downsides. To
> +opt into upgrading to a stricter checking mode on those CPUs, the user
> +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument
> +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call.
> +
> +This feature is currently only supported for upgrading from
> +asynchronous mode. To configure a CPU to upgrade from asynchronous mode
> +to synchronous mode, a privileged user may write the value ``1`` to
> +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable
> +upgrading they may write the value ``2``. By default the feature is
> +disabled on all CPUs.

This needs updated as well to for 0 as disabled.

I wonder whether we could generalise this to something like
mte_tcf_upgrade and allow asymmetric to be expanded to sync. Otherwise
we'd have to add another interface when we know that if a CPU can handle
sync as fast as async, the asymmetric mode should also be upgraded. So a
more generic mte_tcf_upgrade just holds the strictest that the CPU can
handle without significant performance degradation.

The mte_upgrade_async can be confusing as well for the asymmetric mode
where the writes are asynchronous.

> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 125a10e413e9..ad85e8519669 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -4,6 +4,7 @@
>   */
>  
>  #include <linux/bitops.h>
> +#include <linux/cpu.h>
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/prctl.h>
> @@ -26,6 +27,9 @@ u64 gcr_kernel_excl __ro_after_init;
>  
>  static bool report_fault_once = true;
>  
> +DEFINE_PER_CPU_READ_MOSTLY(u64, mte_upgrade_async);
> +EXPORT_PER_CPU_SYMBOL(mte_upgrade_async);

I think this should be static and not exported, unless I missed its use
elsewhere.

> @@ -216,15 +210,33 @@ void mte_thread_init_user(void)
>  	dsb(ish);
>  	write_sysreg_s(0, SYS_TFSRE0_EL1);
>  	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> -	/* disable tag checking */
> -	set_task_sctlr_el1((current->thread.sctlr_user & ~SCTLR_EL1_TCF0_MASK) |
> -			   SCTLR_EL1_TCF0_NONE);
> -	/* reset tag generation mask */
> -	set_gcr_el1_excl(SYS_GCR_EL1_EXCL_MASK);
> +	/* disable tag checking and reset tag generation mask */
> +	current->thread.mte_ctrl =
> +		MTE_CTRL_GCR_USER_EXCL_MASK | MTE_CTRL_TCF_NONE;
> +	mte_update_sctlr_user(current);
> +	set_task_sctlr_el1(current->thread.sctlr_user);
> +}
> +
> +void mte_update_sctlr_user(struct task_struct *task)
> +{
> +	unsigned long sctlr = task->thread.sctlr_user;
> +
> +	sctlr &= ~SCTLR_EL1_TCF0_MASK;
> +	if ((task->thread.mte_ctrl & MTE_CTRL_DYNAMIC_TCF) &&
> +	    (task->thread.mte_ctrl & MTE_CTRL_TCF_MASK) == MTE_CTRL_TCF_ASYNC) {
> +		sctlr |= __this_cpu_read(mte_upgrade_async);

If we consider 0 to mean "disable upgrade", you'd just need another
check here before the write. But it may simplify some of the sysfs code
to avoid the switch statement and the pre-initialisation of
mte_upgrade_async.

> +	} else {
> +		sctlr |= ((task->thread.mte_ctrl & MTE_CTRL_TCF_MASK) >>
> +			  MTE_CTRL_TCF_SHIFT)
> +			 << SCTLR_EL1_TCF0_SHIFT;

Nitpick: we tend to place the operator on the previous line you can
probably place the shift constant on that line as well.

> +	}
> +	task->thread.sctlr_user = sctlr;

I think on the "else" path, we shouldn't bother updating sctlr_user,
though it probably needs some tweaking of the prctl() path.

Otherwise the patch looks in the right direction.

-- 
Catalin



More information about the linux-arm-kernel mailing list