[PATCH 3/4] arm64: errata: Work around early CME DVMSync acknowledgement

Thu Mar 5 06:32:11 PST 2026

On Mon, Mar 02, 2026 at 04:57:56PM +0000, Catalin Marinas wrote:
> C1-Pro acknowledges DVMSync messages before completing the SME/CME
> memory accesses. Work around this by issuing an IPI+DSB to the affected
> CPUs if they are running in EL0 with SME enabled.

Just to make sure I understand the implications, but this _only_ applies
to explicit memory accesses from the SME unit and not, for example, to
page-table walks initiated by SME instructions?

> Signed-off-by: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: Mark Rutland <mark.rutland at arm.com>
> Cc: Mark Brown <broonie at kernel.org>
> ---
>  arch/arm64/Kconfig                | 12 +++++
>  arch/arm64/include/asm/cpucaps.h  |  2 +
>  arch/arm64/include/asm/cputype.h  |  2 +
>  arch/arm64/include/asm/fpsimd.h   | 29 +++++++++++
>  arch/arm64/include/asm/mmu.h      |  1 +
>  arch/arm64/include/asm/tlbflush.h | 17 +++++++
>  arch/arm64/kernel/cpu_errata.c    | 19 ++++++++
>  arch/arm64/kernel/entry-common.c  |  3 ++
>  arch/arm64/kernel/fpsimd.c        | 81 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/process.c       |  7 +++
>  arch/arm64/tools/cpucaps          |  1 +
>  11 files changed, 174 insertions(+)

[...]

> @@ -575,6 +576,14 @@ static const struct midr_range erratum_spec_ssbs_list[] = {
>  };
>  #endif
>  
> +#ifdef CONFIG_ARM64_ERRATUM_SME_DVMSYNC
> +static void cpu_enable_sme_dvmsync(const struct arm64_cpu_capabilities *__unused)
> +{
> +	if (this_cpu_has_cap(ARM64_WORKAROUND_SME_DVMSYNC))
> +		sme_enable_dvmsync();
> +}
> +#endif
> +
>  #ifdef CONFIG_AMPERE_ERRATUM_AC03_CPU_38
>  static const struct midr_range erratum_ac03_cpu_38_list[] = {
>  	MIDR_ALL_VERSIONS(MIDR_AMPERE1),
> @@ -901,6 +910,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		.matches = need_arm_si_l1_workaround_4311569,
>  	},
>  #endif
> +#ifdef CONFIG_ARM64_ERRATUM_SME_DVMSYNC
> +	{
> +		.desc = "C1-Pro SME DVMSync early acknowledgement",
> +		.capability = ARM64_WORKAROUND_SME_DVMSYNC,
> +		.cpu_enable = cpu_enable_sme_dvmsync,
> +		/* C1-Pro r0p0 - r1p2 (the latter only when REVIDR_EL1[0]==0 */
> +		ERRATA_MIDR_RANGE(MIDR_C1_PRO, 0, 0, 1, 2),
> +		MIDR_FIXED(MIDR_CPU_VAR_REV(1, 2), BIT(0)),
> +	},
> +#endif

An alternative to this workaround is just to disable SME entirely, perhaps
by passing 'arm64.nosme' on the cmdline. Maybe we should disable the
workaround in that case?

> @@ -1358,6 +1360,85 @@ void do_sve_acc(unsigned long esr, struct pt_regs *regs)
>  	put_cpu_fpsimd_context();
>  }
>  
> +#ifdef CONFIG_ARM64_ERRATUM_SME_DVMSYNC
> +
> +/*
> + * SME/CME erratum handling
> + */
> +static cpumask_var_t sme_dvmsync_cpus;
> +static cpumask_var_t sme_active_cpus;
> +
> +void sme_set_active(unsigned int cpu)
> +{
> +	if (!cpus_have_final_cap(ARM64_WORKAROUND_SME_DVMSYNC))
> +		return;
> +	if (!cpumask_test_cpu(cpu, sme_dvmsync_cpus))
> +		return;
> +
> +	if (!test_bit(ilog2(MMCF_SME_DVMSYNC), &current->mm->context.flags))
> +		set_bit(ilog2(MMCF_SME_DVMSYNC), &current->mm->context.flags);
> +
> +	cpumask_set_cpu(cpu, sme_active_cpus);
> +
> +	/*
> +	 * Ensure subsequent (SME) memory accesses are observed after the
> +	 * cpumask and the MMCF_SME_DVMSYNC flag setting.
> +	 */
> +	smp_mb();

I can't convince myself that a DMB is enough here, as the whole issue
is that the SME memory accesses can be observed _after_ the TLB
invalidation. I'd have thought we'd need a DSB to ensure that the flag
updates are visible before the exception return.

> +void sme_do_dvmsync(void)
> +{
> +	/*
> +	 * This is called from the TLB maintenance functions after the DSB ISH
> +	 * to send hardware DVMSync message. If this CPU sees the mask as
> +	 * empty, the remote CPU executing sme_set_active() would have seen
> +	 * the DVMSync and no IPI required.
> +	 */
> +	if (cpumask_empty(sme_active_cpus))
> +		return;
> +
> +	preempt_disable();
> +	smp_call_function_many(sme_active_cpus, sme_dvmsync_ipi, NULL, true);
> +	preempt_enable();
> +}

Why do we care about all CPUs using SME, rather than limiting it to the
set of CPUs using SME with the mm we've invalidated? This looks like it
will result in unnecessary cross-calls when multiple tasks are using SME
(especially as the mm flag is only cleared on fork).

Will