[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue
Will Deacon
will at kernel.org
Fri Jun 2 03:34:37 PDT 2023
Hi Ivan,
Sorry for the delay in getting to this.
On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> According to NXP errata document[1] i.MX8QuadMax SoC suffers from
> serious cache coherence issue. It was also mentioned in initial
> support[2] for imx8qm mek machine.
>
> I chose to use an ALTERNATIVE() framework, instead downstream solution[3],
> for this issue with the hope to reduce effect of this fix on unaffected
> platforms.
>
> Unfortunately I was unable to find a way to identify SoC ID using
> registers. Boot CPU MIDR_EL1 is equal to 0x410fd034, AIDR_EL1 is
> equal to 0. So I fallback to using devicetree compatible strings for this.
> And because we can not guarantee that VMs on top of KVM will get
> correct devicetree KVM is disabled.
>
> Also make sure that SMMU "Broadcast TLB Maintenance" is not used even
> SMMU device build in this SoC supports it.
>
> I know this fix is a suboptimal solution for affected machines, but I
> haven't been able to come up with a less intrusive fix. And I hope once
> TLB caches are invalidated any immediate attempt to invalidate them again
> will be close to NOP operation (flush_tlb_kernel_range())
[...]
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1023e896d46b..a5fe6ffb8150 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1159,6 +1159,18 @@ config SOCIONEXT_SYNQUACER_PREITS
>
> If unsure, say Y.
>
> +config NXP_IMX8QM_ERRATUM_ERR050104
> + bool "NXP iMX8QM ERR050104: broken cache/tlb invalidation"
> + default y
> + help
> + On iMX8QM, addresses above bit 35 are not broadcast correctly for
> + TLBI or IC operations, making TLBI and IC unreliable.
> +
> + Work around this erratum by using TLBI *ALL*IS and IC IALLUIS
> + operations. EL0 use of IC IVAU is trapped and upgraded to IC IALLUIS.
> +
> + If unsure, say Y.
> +
> endmenu # "ARM errata workarounds via the alternatives framework"
>
> choice
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 412a3b9a3c25..e3bad2298ea5 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -37,7 +37,11 @@
> : : )
>
> #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE \
> - "tlbi " #op ", %0\n" \
> + ALTERNATIVE("tlbi " #op ", %0\n", \
> + "tlbi vmalle1is\n", \
> + ARM64_WORKAROUND_NXP_ERR050104) \
> + : : "r" (arg)); \
> + asm (ARM64_ASM_PREAMBLE \
> ALTERNATIVE("nop\n nop", \
> "dsb ish\n tlbi " #op ", %0", \
> ARM64_WORKAROUND_REPEAT_TLBI, \
I can see two problems with this:
1. It doesn't interact properly with the ARM64_WORKAROUND_REPEAT_TLBI
workaround
2. You're nuking the entire TLB in the low-level helper, so things like
flush_tlb_range() are going to over-invalidate excessively
Granted, you may not care about (1) for your SoC, but I would prefer not
to introduce artificial constraints on the workaround. I think both of
the issues above stem from plumbing this in too low in the stack.
Can you instead use a static key to redirect the higher-level control flow
to flush_tlb_mm() or flush_tlb_all() for user and kernel respectively? I'm
assuming the ASID _is_ carried on the interconnect?
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 307faa2b4395..b0647b64dbb8 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -8,6 +8,7 @@
> #include <linux/arm-smccc.h>
> #include <linux/types.h>
> #include <linux/cpu.h>
> +#include <linux/of.h>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> #include <asm/cpufeature.h>
> @@ -55,6 +56,14 @@ is_kryo_midr(const struct arm64_cpu_capabilities *entry, int scope)
> return model == entry->midr_range.model;
> }
>
> +static bool __maybe_unused
> +is_imx8qm_soc(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> + WARN_ON(preemptible());
> +
> + return of_machine_is_compatible("fsl,imx8qm");
> +}
> +
> static bool
> has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> int scope)
> @@ -729,6 +738,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> MIDR_FIXED(MIDR_CPU_VAR_REV(1,1), BIT(25)),
> .cpu_enable = cpu_clear_bf16_from_user_emulation,
> },
> +#endif
> +#ifdef CONFIG_NXP_IMX8QM_ERRATUM_ERR050104
> + {
> + .desc = "NXP erratum ERR050104",
> + .capability = ARM64_WORKAROUND_NXP_ERR050104,
> + .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
> + .matches = is_imx8qm_soc,
> + .cpu_enable = cpu_enable_cache_maint_trap,
> + },
> #endif
> {
> }
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 4a79ba100799..265b6334291b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> __user_cache_maint("dc civac", address, ret);
> break;
> case ESR_ELx_SYS64_ISS_CRM_IC_IVAU: /* IC IVAU */
> + if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> + asm volatile("ic ialluis");
Hmm, one oddity here is that you can pass a faulting address and not see
the fault. It looks like that's already IMP DEF, so it's probably ok, but
might be worth a comment.
> + ret = 0;
'ret' is initialised to 0 already?
Finally, how come you don't need to upgrade I-cache invalidation by-VA
in the kernel? It looks like you're only handling operations trapped
from EL0.
Will
More information about the linux-arm-kernel
mailing list