[PATCH v3 0/2] arm64: Allow erratum 1418040 for late CPUs

Will Deacon will at kernel.org
Fri Sep 11 12:42:25 EDT 2020


On Fri, Sep 11, 2020 at 10:04:24PM +0530, Sai Prakash Ranjan wrote:
> On 2020-09-11 19:00, Marc Zyngier wrote:
> > On 2020-09-10 14:43, Sai Prakash Ranjan wrote:
> > > On 2020-09-09 20:23, Doug Anderson wrote:
> > > > Hi,
> > > > 
> > > > On Fri, Aug 21, 2020 at 11:15 AM Catalin Marinas
> > > > <catalin.marinas at arm.com> wrote:
> > > > > 
> > > > > On Fri, 31 Jul 2020 18:38:22 +0100, Marc Zyngier wrote:
> > > > > > Erratum 1418040 currently prevents a late CPU from booting if none
> > > > > > of the early CPUs are affected by it. This is because the handling
> > > > > > is implemented as alternatives, and we have already got rid of them
> > > > > > by the time userspace onlines a new CPU.
> > > > > >
> > > > > > A solution to this is to move everything into C code, and rely on
> > > > > > static keys instead. Once this is done, the feature can be allowed
> > > > > > for late CPUs.
> > > > > >
> > > > > > [...]
> > > > > 
> > > > > Applied to arm64 (for-next/fixes), thanks!
> > > > > 
> > > > > [1/2] arm64: Move handling of erratum 1418040 into C code
> > > > >       https://git.kernel.org/arm64/c/d49f7d7376d0
> > > > > [2/2] arm64: Allow booting of late CPUs affected by erratum 1418040
> > > > >       https://git.kernel.org/arm64/c/bf87bb0881d0
> > > > 
> > > > NOTE: patch 2 seems to have come in through a stable merge onto
> > > > Chrome
> > > > OS 5.4 and is causing a regression when resuming from suspend.
> > > > In the
> > > > short term we've got a revert going into our tree:
> > > > 
> > > > https://crrev.com/c/2399101
> > > > 
> > > > ...but that's obviously not a long term fix.  I haven't done any
> > > > debugging of this myself, though I can if there's nobody more
> > > > qualified to do it and/or nobody else has time.  I'm just trying to
> > > > make sure that the problem is reported somewhere where others might
> > > > notice it rather than in an obscure Chrome OS tree.  ;-)
> > > > 
> > > 
> > > The rootcause is pretty straightforward however I'm afraid the
> > > solution isn't so but I may be mistaken, so this happens on
> > > big.LITTLE systems with CPUs differing in erratum 1418040
> > > which was applicable only for big cores and not little cores.
> > > So when trying to bringup little cores during resume, there
> > > is a conflict as below (messages snipped from the internal bug
> > > for more visibility).
> > > 
> > > Enabling non-boot CPUs ...
> > > CPU features: CPU1: Detected conflict for capability 35 (ARM erratum
> > > 1418040), System: 1, CPU: 0
> > > CPU1: will not boot
> > > CPU1: will not boot
> > > CPU1: failed to come online
> > > psci: CPU1 killed (polled 0 ms)
> > > CPU1: died during early boot
> > > Error taking CPU1 up: -5
> > 
> > This is becoming very annoying... By allowing the buggy CPUs to come
> > in late, we have made it impossible for the good ones to work correctly.
> > 
> > Can you try this (untested yet, I'm dealing with another bucket of
> > errata at the moment):
> > 
> > diff --git a/arch/arm64/kernel/cpu_errata.c
> > b/arch/arm64/kernel/cpu_errata.c
> > index 6c8303559beb..fcf7f763400c 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -477,6 +477,7 @@ const struct arm64_cpu_capabilities arm64_errata[] =
> > {
> >  		.capability = ARM64_WORKAROUND_1418040,
> >  		ERRATA_MIDR_RANGE_LIST(erratum_1418040_list),
> >  		.type = (ARM64_CPUCAP_SCOPE_LOCAL_CPU |
> > +			 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU |
> >  			 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU),
> >  	},
> >  #endif
> > 
> 
> Yes, this works.

Maybe we should spell it "ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE" and add
a comment about the "feature"?

Will



More information about the linux-arm-kernel mailing list