[PATCH] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue
Peng Fan
peng.fan at oss.nxp.com
Sun Apr 16 20:07:44 PDT 2023
+Frank, who had worked on downstream solution.
On 4/12/2023 8:55 PM, Ivan T. Ivanov wrote:
> According to NXP errata document[1] i.MX8QuadMax SoC suffers from
> serious cache coherence issue. It was also mentioned in initial
> support[2] for imx8qm mek machine.
>
> I chose to use an ALTERNATIVE() framework, instead downstream solution[3],
> for this issue with the hope to reduce effect of this fix on unaffected
> platforms.
>
> Unfortunately I was unable to find a way to identify SoC ID using
> registers. Boot CPU MIDR_EL1 is equal to 0x410fd034. So I fallback to
> using devicetree compatible strings for this.
>
> I know this fix is a suboptimal solution for affected machines, but I
> haven't been able to come up with a less intrusive fix. And I hope once
> TLB caches are invalidated any immediate attempt to invalidate them again
> will be close to NOP operation (flush_tlb_kernel_range())
>
> I have run few simple benchmarks and perf tests on affected and unaffected
> machines and I was not able see any obvious issues. iMX8QM "performance"
> was nearly doubled with 2 A72 bringed online.
>
> Following is excerpt from NXP IMX8_1N94W "Mask Set Errata" document
> Rev. 5, 3/2023. Just in case it gets lost somehow.
>
> ---
> "ERR050104: Arm/A53: Cache coherency issue"
>
> Description
>
> Some maintenance operations exchanged between the A53 and A72
> core clusters, involving some Translation Look-aside Buffer
> Invalidate (TLBI) and Instruction Cache (IC) instructions can
> be corrupted. The upper bits, above bit-35, of ARADDR and ACADDR
> buses within in Arm A53 sub-system have been incorrectly connected.
> Therefore ARADDR and ACADDR address bits above bit-35 should not
> be used.
>
> Workaround
>
> The following software instructions are required to be downgraded
> to TLBI VMALLE1IS: TLBI ASIDE1, TLBI ASIDE1IS, TLBI VAAE1,
> TLBI VAAE1IS, TLBI VAALE1, TLBI VAALE1IS, TLBI VAE1, TLBI VAE1IS,
> TLBI VALE1, TLBI VALE1IS
>
> The following software instructions are required to be downgraded
> to TLBI VMALLS12E1IS: TLBI IPAS2E1IS, TLBI IPAS2LE1IS
>
> The following software instructions are required to be downgraded
> to TLBI ALLE2IS: TLBI VAE2IS, TLBI VALE2IS.
>
> The following software instructions are required to be downgraded
> to TLBI ALLE3IS: TLBI VAE3IS, TLBI VALE3IS.
>
> The following software instructions are required to be downgraded
> to TLBI VMALLE1IS when the Force Broadcast (FB) bit [9] of the
> Hypervisor Configuration Register (HCR_EL2) is set:
> TLBI ASIDE1, TLBI VAAE1, TLBI VAALE1, TLBI VAE1, TLBI VALE1
>
> The following software instruction is required to be downgraded
> to IC IALLUIS: IC IVAU, Xt
>
> Specifically for the IC IVAU, Xt downgrade, setting SCTLR_EL1.UCI
> to 0 will disable EL0 access to this instruction. Any attempt to
> execute from EL0 will generate an EL1 trap, where the downgrade to
> IC ALLUIS can be implemented.
> --
>
> [1] https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf
> [2] 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support")
> [3] https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/arch/arm64/include/asm/tlbflush.h#L19
>
> Signed-off-by: Ivan T. Ivanov <iivanov at suse.de>
> ---
> Documentation/arm64/silicon-errata.rst | 2 ++
> arch/arm64/Kconfig | 10 ++++++++++
> arch/arm64/include/asm/cpufeature.h | 3 ++-
> arch/arm64/include/asm/tlbflush.h | 6 +++++-
> arch/arm64/kernel/cpu_errata.c | 18 ++++++++++++++++++
> arch/arm64/kernel/traps.c | 22 +++++++++++++++++++++-
> arch/arm64/tools/cpucaps | 1 +
> 7 files changed, 59 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
> index ec5f889d7681..fce231797184 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -175,6 +175,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> +| Freescale/NXP | i.MX 8QuadMax | ERR050104 | NXP_IMX8QM_ERRATUM_ERR050104|
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1023e896d46b..437cb53f8753 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1159,6 +1159,16 @@ config SOCIONEXT_SYNQUACER_PREITS
>
> If unsure, say Y.
>
> +config NXP_IMX8QM_ERRATUM_ERR050104
> + bool "NXP iMX8QM: Workaround for Arm/A53 Cache coherency issue"
> + default n
> + help
> + Some maintenance operations exchanged between the A53 and A72 core
> + clusters, involving some Translation Look-aside Buffer Invalidate
> + (TLBI) and Instruction Cache (IC) instructions can be corrupted.
> +
> + If unsure, say N.
> +
> endmenu # "ARM errata workarounds via the alternatives framework"
>
> choice
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 6bf013fb110d..1ed648f7f29a 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -835,7 +835,8 @@ static inline bool system_supports_bti(void)
> static inline bool system_supports_tlb_range(void)
> {
> return IS_ENABLED(CONFIG_ARM64_TLB_RANGE) &&
> - cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
> + cpus_have_const_cap(ARM64_HAS_TLB_RANGE) &&
> + !cpus_have_const_cap(ARM64_WORKAROUND_NXP_ERR050104);
> }
>
> int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 412a3b9a3c25..12055b859ce3 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -37,7 +37,11 @@
> : : )
>
> #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE \
> - "tlbi " #op ", %0\n" \
> + ALTERNATIVE("nop\n nop\n tlbi " #op ", %0", \
> + "tlbi vmalle1is\n dsb ish\n isb", \
> + ARM64_WORKAROUND_NXP_ERR050104) \
> + : : "r" (arg)); \
> + asm (ARM64_ASM_PREAMBLE \
> ALTERNATIVE("nop\n nop", \
> "dsb ish\n tlbi " #op ", %0", \
> ARM64_WORKAROUND_REPEAT_TLBI, \
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 307faa2b4395..7b702a79bf60 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -8,6 +8,7 @@
> #include <linux/arm-smccc.h>
> #include <linux/types.h>
> #include <linux/cpu.h>
> +#include <linux/of.h>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> #include <asm/cpufeature.h>
> @@ -55,6 +56,14 @@ is_kryo_midr(const struct arm64_cpu_capabilities *entry, int scope)
> return model == entry->midr_range.model;
> }
>
> +static bool __maybe_unused
> +is_imx8qm_soc(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> + WARN_ON(preemptible());
> +
> + return of_machine_is_compatible("fsl,imx8qm");
> +}
> +
> static bool
> has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> int scope)
> @@ -729,6 +738,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> MIDR_FIXED(MIDR_CPU_VAR_REV(1,1), BIT(25)),
> .cpu_enable = cpu_clear_bf16_from_user_emulation,
> },
> +#endif
> +#ifdef CONFIG_NXP_IMX8QM_ERRATUM_ERR050104
> + {
> + .desc = "NXP A53 cache coherency issue",
> + .capability = ARM64_WORKAROUND_NXP_ERR050104,
> + .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
> + .matches = is_imx8qm_soc,
> + .cpu_enable = cpu_enable_cache_maint_trap,
> + },
> #endif
> {
> }
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 4a79ba100799..4858f8c86fd5 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -529,6 +529,26 @@ void do_el1_fpac(struct pt_regs *regs, unsigned long esr)
> uaccess_ttbr0_disable(); \
> }
>
> +#define __user_instruction_cache_maint(address, res) \
> +do { \
> + if (address >= TASK_SIZE_MAX) { \
> + res = -EFAULT; \
> + } else { \
> + uaccess_ttbr0_enable(); \
> + asm volatile ( \
> + "1:\n" \
> + ALTERNATIVE(" ic ivau, %1\n", \
> + " ic ialluis\n", \
> + ARM64_WORKAROUND_NXP_ERR050104) \
> + " mov %w0, #0\n" \
> + "2:\n" \
> + _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0) \
> + : "=r" (res) \
> + : "r" (address)); \
> + uaccess_ttbr0_disable(); \
> + } \
> +} while (0)
> +
> static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> {
> unsigned long tagged_address, address;
> @@ -556,7 +576,7 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> __user_cache_maint("dc civac", address, ret);
> break;
> case ESR_ELx_SYS64_ISS_CRM_IC_IVAU: /* IC IVAU */
> - __user_cache_maint("ic ivau", address, ret);
> + __user_instruction_cache_maint(address, ret);
> break;
> default:
> force_signal_inject(SIGILL, ILL_ILLOPC, regs->pc, 0);
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 37b1340e9646..e225f1cd1005 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -90,3 +90,4 @@ WORKAROUND_NVIDIA_CARMEL_CNP
> WORKAROUND_QCOM_FALKOR_E1003
> WORKAROUND_REPEAT_TLBI
> WORKAROUND_SPECULATIVE_AT
> +WORKAROUND_NXP_ERR050104
More information about the linux-arm-kernel
mailing list