[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue

Thu Jun 8 06:39:29 PDT 2023

On 06-02 11:34, Will Deacon wrote:
> 
> Hi Ivan,
> 
> Sorry for the delay in getting to this.
> 
> On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> > According to NXP errata document[1] i.MX8QuadMax SoC suffers from
> > serious cache coherence issue. It was also mentioned in initial
> > support[2] for imx8qm mek machine.
> > 
> > I chose to use an ALTERNATIVE() framework, instead downstream solution[3],
> > for this issue with the hope to reduce effect of this fix on unaffected
> > platforms.
> > 
> > Unfortunately I was unable to find a way to identify SoC ID using
> > registers. Boot CPU MIDR_EL1 is equal to 0x410fd034, AIDR_EL1 is
> > equal to 0. So I fallback to using devicetree compatible strings for this.
> > And because we can not guarantee that VMs on top of KVM will get
> > correct devicetree KVM is disabled.
> > 
> > Also make sure that SMMU "Broadcast TLB Maintenance" is not used even
> > SMMU device build in this SoC supports it.
> > 
> > I know this fix is a suboptimal solution for affected machines, but I
> > haven't been able to come up with a less intrusive fix.  And I hope once
> > TLB caches are invalidated any immediate attempt to invalidate them again
> > will be close to NOP operation (flush_tlb_kernel_range())
> 
> [...]
> 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1023e896d46b..a5fe6ffb8150 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1159,6 +1159,18 @@ config SOCIONEXT_SYNQUACER_PREITS
> >  
> >  	  If unsure, say Y.
> >  
> > +config NXP_IMX8QM_ERRATUM_ERR050104
> > +	bool "NXP iMX8QM ERR050104: broken cache/tlb invalidation"
> > +	default y
> > +	help
> > +	  On iMX8QM, addresses above bit 35 are not broadcast correctly for
> > +	  TLBI or IC operations, making TLBI and IC unreliable.
> > +
> > +	  Work around this erratum by using TLBI *ALL*IS and IC IALLUIS
> > +	  operations. EL0 use of IC IVAU is trapped and upgraded to IC IALLUIS.
> > +
> > +	  If unsure, say Y.
> > +
> >  endmenu # "ARM errata workarounds via the alternatives framework"
> >  
> >  choice
> > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> > index 412a3b9a3c25..e3bad2298ea5 100644
> > --- a/arch/arm64/include/asm/tlbflush.h
> > +++ b/arch/arm64/include/asm/tlbflush.h
> > @@ -37,7 +37,11 @@
> >  			    : : )
> >  
> >  #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE			       \
> > -			       "tlbi " #op ", %0\n"			       \
> > +		   ALTERNATIVE("tlbi " #op ", %0\n",			       \
> > +			       "tlbi vmalle1is\n",			       \
> > +			       ARM64_WORKAROUND_NXP_ERR050104)		       \
> > +			    : : "r" (arg));				       \
> > +			  asm (ARM64_ASM_PREAMBLE			       \
> >  		   ALTERNATIVE("nop\n			nop",		       \
> >  			       "dsb ish\n		tlbi " #op ", %0",     \
> >  			       ARM64_WORKAROUND_REPEAT_TLBI,		       \
> 
> I can see two problems with this:
> 
>   1. It doesn't interact properly with the ARM64_WORKAROUND_REPEAT_TLBI
>      workaround

>From what I see, they can not happen at the same time on the same SoC,
so this should not be an issue, but I could be missing something.

> 
>   2. You're nuking the entire TLB in the low-level helper, so things like
>      flush_tlb_range() are going to over-invalidate excessively

Right, this is noted in the commit message.

> 
> Granted, you may not care about (1) for your SoC, but I would prefer not
> to introduce artificial constraints on the workaround. I think both of
> the issues above stem from plumbing this in too low in the stack.
> 
> Can you instead use a static key to redirect the higher-level control flow
> to flush_tlb_mm() or flush_tlb_all() for user and kernel respectively?

This was the other option considered, but it was looking more intrusive.

> I'm
> assuming the ASID _is_ carried on the interconnect?

Looking at history of in tlbflush in the vendor tree [4] I think ASID
is not properly wired either [5]. But I am failing to see why this
should matter.

[4] https://github.com/nxp-imx/linux-imx/commits/lf-6.1.y/arch/arm64/include/asm/tlbflush.h
[5] https://github.com/nxp-imx/linux-imx/commit/db6c157fff2a39c378e41e63f78d6fba18578758

> 
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index 307faa2b4395..b0647b64dbb8 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/arm-smccc.h>
> >  #include <linux/types.h>
> >  #include <linux/cpu.h>
> > +#include <linux/of.h>
> >  #include <asm/cpu.h>
> >  #include <asm/cputype.h>
> >  #include <asm/cpufeature.h>
> > @@ -55,6 +56,14 @@ is_kryo_midr(const struct arm64_cpu_capabilities *entry, int scope)
> >  	return model == entry->midr_range.model;
> >  }
> >  
> > +static bool __maybe_unused
> > +is_imx8qm_soc(const struct arm64_cpu_capabilities *entry, int scope)
> > +{
> > +	WARN_ON(preemptible());
> > +
> > +	return of_machine_is_compatible("fsl,imx8qm");
> > +}
> > +
> >  static bool
> >  has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> >  			  int scope)
> > @@ -729,6 +738,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> >  		MIDR_FIXED(MIDR_CPU_VAR_REV(1,1), BIT(25)),
> >  		.cpu_enable = cpu_clear_bf16_from_user_emulation,
> >  	},
> > +#endif
> > +#ifdef CONFIG_NXP_IMX8QM_ERRATUM_ERR050104
> > +	{
> > +		.desc = "NXP erratum ERR050104",
> > +		.capability = ARM64_WORKAROUND_NXP_ERR050104,
> > +		.type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
> > +		.matches = is_imx8qm_soc,
> > +		.cpu_enable = cpu_enable_cache_maint_trap,
> > +	},
> >  #endif
> >  	{
> >  	}
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > index 4a79ba100799..265b6334291b 100644
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> >  		__user_cache_maint("dc civac", address, ret);
> >  		break;
> >  	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
> > +		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> > +			asm volatile("ic ialluis");
> 
> Hmm, one oddity here is that you can pass a faulting address and not see
> the fault. It looks like that's already IMP DEF, so it's probably ok, but
> might be worth a comment.

I am not sure what should be expected behavior, but I could
add comment, sure.

> 
> > +			ret = 0;
> 
> 'ret' is initialised to 0 already?

Oh, yes.

> 
> Finally, how come you don't need to upgrade I-cache invalidation by-VA
> in the kernel? It looks like you're only handling operations trapped
> from EL0.

Hm, I was thinking that __tlbi() is taking care for this or you mean 
something else, like locations in assembler.h?

Regards,
Ivan