[PATCH v5] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Will Deacon will at kernel.org
Thu Jun 17 14:58:30 PDT 2021


Hi Peter,

On Tue, Jun 15, 2021 at 01:38:07PM -0700, Peter Collingbourne wrote:
> On some CPUs the performance of MTE in synchronous mode is similar
> to that of asynchronous mode. This makes it worthwhile to enable
> synchronous mode on those CPUs when asynchronous mode is requested,
> in order to gain the error detection benefits of synchronous mode
> without the performance downsides. Therefore, make it possible for
> user programs to opt into upgrading to synchronous mode on those CPUs
> via a new prctl flag. The flag is orthogonal to the existing TCF modes
> in order to accommodate upgrading from other TCF modes in the future.
> 
> The feature is controlled on a per-CPU basis via sysfs.
> 
> Signed-off-by: Peter Collingbourne <pcc at google.com>
> Link: https://linux-review.googlesource.com/id/Id6f95b71fde6e701dd30b5e108126af7286147e8
> ---
> v5:
> - updated documentation
> - address some nits in mte.c
> 
> v4:
> - switch to new mte_ctrl field
> - make register_mte_upgrade_async_sysctl return an int
> - change the sysctl to take 0 or 1 instead of raw TCF values
> - "same as" -> "similar to"
> 
> v3:
> - drop the device tree support
> - add documentation
> - add static_assert to ensure no overlap with real HW bits
> - move per-CPU variable initialization to mte.c
> - use smp_call_function_single instead of stop_machine
> 
> v2:
> - make it an opt-in behavior
> - change the format of the device tree node
> - also allow controlling the feature via sysfs
> 
>  .../arm64/memory-tagging-extension.rst        |  20 +++
>  arch/arm64/include/asm/mte.h                  |   4 +
>  arch/arm64/include/asm/processor.h            |  14 +-
>  arch/arm64/kernel/asm-offsets.c               |   2 +-
>  arch/arm64/kernel/entry.S                     |   4 +-
>  arch/arm64/kernel/mte.c                       | 151 ++++++++++++++----
>  arch/arm64/kernel/process.c                   |   2 +-
>  include/uapi/linux/prctl.h                    |   2 +
>  8 files changed, 162 insertions(+), 37 deletions(-)
> 
> diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
> index b540178a93f8..5375d5efd18c 100644
> --- a/Documentation/arm64/memory-tagging-extension.rst
> +++ b/Documentation/arm64/memory-tagging-extension.rst
> @@ -120,6 +120,25 @@ in the ``PR_MTE_TAG_MASK`` bit-field.
>  interface provides an include mask. An include mask of ``0`` (exclusion
>  mask ``0xffff``) results in the CPU always generating tag ``0``.
>  
> +Upgrading to stricter tag checking modes
> +----------------------------------------
> +
> +On some CPUs the performance of MTE in stricter tag checking modes
> +is similar to that of less strict tag checking modes. This makes it
> +worthwhile to enable stricter checks on those CPUs when a less strict
> +checking mode is requested, in order to gain the error detection
> +benefits of the stricter checks without the performance downsides. To
> +opt into upgrading to a stricter checking mode on those CPUs, the user
> +can set the ``PR_MTE_DYNAMIC_TCF`` flag bit in the ``flags`` argument
> +to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call.
> +
> +This feature is currently only supported for upgrading from
> +asynchronous mode. To configure a CPU to upgrade from asynchronous mode
> +to synchronous mode, a privileged user may write the value ``1`` to
> +``/sys/devices/system/cpu/cpu<N>/mte_upgrade_async``, and to disable
> +upgrading they may write the value ``0``. By default the feature is
> +disabled on all CPUs.
> +
>  Initial process state
>  ---------------------
>  
> @@ -128,6 +147,7 @@ On ``execve()``, the new process has the following configuration:
>  - ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
>  - Tag checking mode set to ``PR_MTE_TCF_NONE``
>  - ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
> +- ``PR_MTE_DYNAMIC_TCF`` set to 0 (disabled)
>  - ``PSTATE.TCO`` set to 0
>  - ``PROT_MTE`` not set on any of the initial memory maps

Something about this doesn't sit right with me, as we're mixing a per-task
interface with a per-cpu interface for selecting async/sync MTE and the
priorities are somewhat confusing.

I think a better interface would be for the sysfs entry for each CPU to
allow selection between:

	task  : Honour the prctl() (current behaviour)
	async : Force async for tasks using MTE
	sync  : Force sync for tasks using MTE
	none  : MTE disabled

i.e. the per-cpu setting is an override.

Would that give you what you need?

Will



More information about the linux-arm-kernel mailing list