arm64 torture test hotplug failures (offlining causes -EBUSY)

Will Deacon will at kernel.org
Wed Jan 18 08:51:22 PST 2023


On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 18, 2023 at 02:17:06AM +0000, Joel Fernandes wrote:
>
> I would be happier to forgive failure to offline housekeeping CPUs than
> blanket forgiveness of CPU 0.  Especially given that I recently got
> burned by a non-zero boot cpu.  ;-)
> 
> But wouldn't it be even better for cpu_is_hotpluggable() to know the
> NO_HZ_FULL rules of the road?
> 
> > Adding Frederic to CC as well as we are talking about
> > housekeeping/isolation stuff.
> 
> But as you say, perhaps Frederic has a better idea.
> 
> > > And topology_init() sets this based on platform_can_hotplug_cpu(cpu).
> > > And this function sets CPU 0 as !cpu_is_hotpluggable() unless the
> > > architecture specifies a .cpu_can_disable() function.
> > 
> > Ah, that is 32-bit ARM code only. This issue is on 64-bit ARM (arch/arm64/).
> 
> Apologies!  I will look more carefully at the pathnames next time!
> 
> But maybe arm64 needs something similar?

Just chiming quickly from the arm64 side here, but there's nothing in the
architecture that precludes offlining CPU 0 and it certainly works on some
platforms, so I'd be hesitant to rule it out entirely for testing.

One reason why hotplug can fail in practice is if a trusted OS (i.e. code
running on the secure side of the fence outside of Linux's view of the
world) is resident on a core and rejects firmware requests to power it
off. The PSCI code (drivers/firmware/psci/) should detect this and return
-EPERM, although earlier in this thread there was mention of -EBUSY so it
sounds like something else...

Will



More information about the linux-arm-kernel mailing list