[PATCH] arm64: Remove checks for broken Cavium HW from the PI code
Will Deacon
will at kernel.org
Thu Apr 17 07:01:48 PDT 2025
On Wed, Apr 16, 2025 at 01:35:34PM +0100, Marc Zyngier wrote:
> Calling into the MIDR checking framework from the PI code has recently
> become much harder, due to the new fancy "multi-MIDR" support that
> relies on tables being populated at boot time, but not that early that
> they are available to the PI code. There are additional issues with
> this framework, as the code really isn't position independend *at all*.
>
> This leads to some ugly breakages, as reported by Ada.
>
> It so appears that the only reason for the PI code to call into the
> MIDR checking code is to cope with The Most Broken ARM64 System Ever,
> aka Cavium ThunderX, which cannot deal with nG attributes that result
> of the combination of KASLR and KPTI as a consequence of Erratum 27456.
>
> Rather than adding extra complexity for something that is actually
> a very dead horse, let's simply drop that check. On my own machine,
> the firmware doesn't provide a KASLR seed, preventing the pathological
> case to show up.
>
> And if someone does have a broken box that passes a seed to the kernel,
> "nokaslr" on the command-line is an easy enough workaround.
>
> Fixes: c8c2647e69bed ("arm64: Make _midr_in_range_list() an exported function")
> Reported-by: Ada Couprie Diaz <ada.coupriediaz at arm.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Link: https://lore.kernel.org/r/3d97e45a-23cf-419b-9b6f-140b4d88de7b@arm.com
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: Shameer Kolothum <shameerali.kolothum.thodi at huawei.com>
> Cc: Oliver Upton <oliver.upton at linux.dev>
> ---
> arch/arm64/include/asm/mmu.h | 11 -----------
> arch/arm64/kernel/cpu_errata.c | 2 +-
> arch/arm64/kernel/image-vars.h | 4 ----
> 3 files changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 30a29e88994ba..6e8aa8e726015 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -94,17 +94,6 @@ static inline bool kaslr_requires_kpti(void)
> return false;
> }
>
> - /*
> - * Systems affected by Cavium erratum 24756 are incompatible
> - * with KPTI.
> - */
> - if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
> - extern const struct midr_range cavium_erratum_27456_cpus[];
> -
> - if (is_midr_in_range_list(cavium_erratum_27456_cpus))
> - return false;
> - }
> -
> return true;
> }
>
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index b55f5f7057502..6b0ad5070d3e0 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -335,7 +335,7 @@ static const struct midr_range cavium_erratum_23154_cpus[] = {
> #endif
>
> #ifdef CONFIG_CAVIUM_ERRATUM_27456
> -const struct midr_range cavium_erratum_27456_cpus[] = {
> +static const struct midr_range cavium_erratum_27456_cpus[] = {
> /* Cavium ThunderX, T88 pass 1.x - 2.1 */
> MIDR_RANGE(MIDR_THUNDERX, 0, 0, 1, 1),
> /* Cavium ThunderX, T81 pass 1.0 */
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 5e3c4b58f2790..2004b4f41ade6 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -47,10 +47,6 @@ PROVIDE(__pi_id_aa64smfr0_override = id_aa64smfr0_override);
> PROVIDE(__pi_id_aa64zfr0_override = id_aa64zfr0_override);
> PROVIDE(__pi_arm64_sw_feature_override = arm64_sw_feature_override);
> PROVIDE(__pi_arm64_use_ng_mappings = arm64_use_ng_mappings);
> -#ifdef CONFIG_CAVIUM_ERRATUM_27456
> -PROVIDE(__pi_cavium_erratum_27456_cpus = cavium_erratum_27456_cpus);
> -PROVIDE(__pi_is_midr_in_range_list = is_midr_in_range_list);
> -#endif
Hmm, I'm not really sure about this as it seems to be punishing the poor
user of the box who really hasn't done anything wrong. The firmware might
not be passing a KASLR seed to the host, but what about guest VMs? It's
pretty easy for the VMM to stuff a seed into the DT, so if somebody was
running VMs on TX1 with kaslr before c8c2647e69bed ("arm64: Make
_midr_in_range_list() an exported function") and things worked, I don't
think we should be breaking that so that it crashes in weird ways.
Could we just read MIDR_EL1 (read_cpuid_id()) in the PI code to detect
ThunderX that way? I'm less fussed about the multi-MIDR code working on
this machine as it's a new feature (and we could disable the new
hypercalls here if we really cared).
Alternatively, we could properly rip out the TX1 support and fail to boot
if we detect it. At least that's deterministic and we could have a useful
error message about the thing being unsupported.
Will
More information about the linux-arm-kernel
mailing list