[PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs

Marc Zyngier maz at kernel.org
Fri Sep 11 09:30:24 EDT 2020


On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
> On 2020-09-09 20:23, Doug Anderson wrote:
>> Hi,
>> 
>> On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
>> <catalin.marinas at arm.com> wrote:
>>> 
>>> On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
>>> > Erratum 1418040 currently prevents a late CPU from booting if none
>>> > of the early CPUs are affected by it. This is because the handling
>>> > is implemented as alternatives, and we have already got rid of them
>>> > by the time userspace onlines a new CPU.
>>> >
>>> > A solution to this is to move everything into C code, and rely on
>>> > static keys instead. Once this is done, the feature can be allowed
>>> > for late CPUs.
>>> >
>>> > [...]
>>> 
>>> Applied to arm64 (for-next/fixes), thanks!
>>> 
>>> [1/2] arm64: Move handling of erratum 1418040 into C code
>>>       https://git.kernel.org/arm64/c/d49f7d7376d0
>>> [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
>>>       https://git.kernel.org/arm64/c/bf87bb0881d0
>> 
>> NOTE: patch 2 seems to have come in through a stable merge onto Chrome
>> OS 5.4 and is causing a regression when resuming from suspend.  In the
>> short term we've got a revert going into our tree:
>> 
>> https://crrev.com/c/2399101
>> 
>> ...but that's obviously not a long term fix.  I haven't done any
>> debugging of this myself, though I can if there's nobody more
>> qualified to do it and/or nobody else has time.  I'm just trying to
>> make sure that the problem is reported somewhere where others might
>> notice it rather than in an obscure Chrome OS tree.  ;-)
>> 
> 
> The rootcause is pretty straightforward however I'm afraid the
> solution isn't so but I may be mistaken, so this happens on
> big.LITTLE systems with CPUs differing in erratum 1418040
> which was applicable only for big cores and not little cores.
> So when trying to bringup little cores during resume, there
> is a conflict as below (messages snipped from the internal bug
> for more visibility).
> 
> Enabling non-boot CPUs ...
> CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
> 1418040), System: 1, CPU: 0
> CPU1: will not boot
> CPU1: will not boot
> CPU1: failed to come online
> psci: CPU1 killed (polled 0 ms)
> CPU1: died during early boot
> Error taking CPU1 up: -5

This is becoming very annoying... By allowing the buggy CPUs to come
in late, we have made it impossible for the good ones to work correctly.

Can you try this (untested yet, I'm dealing with another bucket of
errata at the moment):

diff --git a/arch/arm64/kernel/cpu_errata.c 
b/arch/arm64/kernel/cpu_errata.c
index 6c8303559beb..fcf7f763400c 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = 
{
  		.capability = ARM64_WORKAROUND_1418040,
  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
+			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
  	},
  #endif


Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list