[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue

Fri Jun 2 03:34:37 PDT 2023

Hi Ivan,

Sorry for the delay in getting to this.

On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> According to NXP errata document[1] i.MX8QuadMax SoC suffers from
> serious cache coherence issue. It was also mentioned in initial
> support[2] for imx8qm mek machine.
> 
> I chose to use an ALTERNATIVE() framework, instead downstream solution[3],
> for this issue with the hope to reduce effect of this fix on unaffected
> platforms.
> 
> Unfortunately I was unable to find a way to identify SoC ID using
> registers. Boot CPU MIDR_EL1 is equal to 0x410fd034, AIDR_EL1 is
> equal to 0. So I fallback to using devicetree compatible strings for this.
> And because we can not guarantee that VMs on top of KVM will get
> correct devicetree KVM is disabled.
> 
> Also make sure that SMMU "Broadcast TLB Maintenance" is not used even
> SMMU device build in this SoC supports it.
> 
> I know this fix is a suboptimal solution for affected machines, but I
> haven't been able to come up with a less intrusive fix.  And I hope once
> TLB caches are invalidated any immediate attempt to invalidate them again
> will be close to NOP operation (flush_tlb_kernel_range())

[...]

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1023e896d46b..a5fe6ffb8150 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1159,6 +1159,18 @@ config SOCIONEXT_SYNQUACER_PREITS
>  
>  	  If unsure, say Y.
>  
> +config NXP_IMX8QM_ERRATUM_ERR050104
> +	bool "NXP iMX8QM ERR050104: broken cache/tlb invalidation"
> +	default y
> +	help
> +	  On iMX8QM, addresses above bit 35 are not broadcast correctly for
> +	  TLBI or IC operations, making TLBI and IC unreliable.
> +
> +	  Work around this erratum by using TLBI *ALL*IS and IC IALLUIS
> +	  operations. EL0 use of IC IVAU is trapped and upgraded to IC IALLUIS.
> +
> +	  If unsure, say Y.
> +
>  endmenu # "ARM errata workarounds via the alternatives framework"
>  
>  choice
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 412a3b9a3c25..e3bad2298ea5 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -37,7 +37,11 @@
>  			    : : )
>  
>  #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE			       \
> -			       "tlbi " #op ", %0\n"			       \
> +		   ALTERNATIVE("tlbi " #op ", %0\n",			       \
> +			       "tlbi vmalle1is\n",			       \
> +			       ARM64_WORKAROUND_NXP_ERR050104)		       \
> +			    : : "r" (arg));				       \
> +			  asm (ARM64_ASM_PREAMBLE			       \
>  		   ALTERNATIVE("nop\n			nop",		       \
>  			       "dsb ish\n		tlbi " #op ", %0",     \
>  			       ARM64_WORKAROUND_REPEAT_TLBI,		       \

I can see two problems with this:

  1. It doesn't interact properly with the ARM64_WORKAROUND_REPEAT_TLBI
     workaround

  2. You're nuking the entire TLB in the low-level helper, so things like
     flush_tlb_range() are going to over-invalidate excessively

Granted, you may not care about (1) for your SoC, but I would prefer not
to introduce artificial constraints on the workaround. I think both of
the issues above stem from plumbing this in too low in the stack.

Can you instead use a static key to redirect the higher-level control flow
to flush_tlb_mm() or flush_tlb_all() for user and kernel respectively? I'm
assuming the ASID _is_ carried on the interconnect?

> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 307faa2b4395..b0647b64dbb8 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -8,6 +8,7 @@
>  #include <linux/arm-smccc.h>
>  #include <linux/types.h>
>  #include <linux/cpu.h>
> +#include <linux/of.h>
>  #include <asm/cpu.h>
>  #include <asm/cputype.h>
>  #include <asm/cpufeature.h>
> @@ -55,6 +56,14 @@ is_kryo_midr(const struct arm64_cpu_capabilities *entry, int scope)
>  	return model == entry->midr_range.model;
>  }
>  
> +static bool __maybe_unused
> +is_imx8qm_soc(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> +	WARN_ON(preemptible());
> +
> +	return of_machine_is_compatible("fsl,imx8qm");
> +}
> +
>  static bool
>  has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
>  			  int scope)
> @@ -729,6 +738,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>  		MIDR_FIXED(MIDR_CPU_VAR_REV(1,1), BIT(25)),
>  		.cpu_enable = cpu_clear_bf16_from_user_emulation,
>  	},
> +#endif
> +#ifdef CONFIG_NXP_IMX8QM_ERRATUM_ERR050104
> +	{
> +		.desc = "NXP erratum ERR050104",
> +		.capability = ARM64_WORKAROUND_NXP_ERR050104,
> +		.type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
> +		.matches = is_imx8qm_soc,
> +		.cpu_enable = cpu_enable_cache_maint_trap,
> +	},
>  #endif
>  	{
>  	}
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 4a79ba100799..265b6334291b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
>  		__user_cache_maint("dc civac", address, ret);
>  		break;
>  	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
> +		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> +			asm volatile("ic ialluis");

Hmm, one oddity here is that you can pass a faulting address and not see
the fault. It looks like that's already IMP DEF, so it's probably ok, but
might be worth a comment.

> +			ret = 0;

'ret' is initialised to 0 already?

Finally, how come you don't need to upgrade I-cache invalidation by-VA
in the kernel? It looks like you're only handling operations trapped
from EL0.

Will