arm64 torture test hotplug failures (offlining causes -EBUSY)

Paul E. McKenney paulmck at kernel.org
Wed Jan 18 09:56:35 PST 2023


On Wed, Jan 18, 2023 at 04:51:22PM +0000, Will Deacon wrote:
> On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 18, 2023 at 02:17:06AM +0000, Joel Fernandes wrote:
> >
> > I would be happier to forgive failure to offline housekeeping CPUs than
> > blanket forgiveness of CPU 0.  Especially given that I recently got
> > burned by a non-zero boot cpu.  ;-)
> > 
> > But wouldn't it be even better for cpu_is_hotpluggable() to know the
> > NO_HZ_FULL rules of the road?
> > 
> > > Adding Frederic to CC as well as we are talking about
> > > housekeeping/isolation stuff.
> > 
> > But as you say, perhaps Frederic has a better idea.
> > 
> > > > And topology_init() sets this based on platform_can_hotplug_cpu(cpu).
> > > > And this function sets CPU 0 as !cpu_is_hotpluggable() unless the
> > > > architecture specifies a .cpu_can_disable() function.
> > > 
> > > Ah, that is 32-bit ARM code only. This issue is on 64-bit ARM (arch/arm64/).
> > 
> > Apologies!  I will look more carefully at the pathnames next time!
> > 
> > But maybe arm64 needs something similar?
> 
> Just chiming quickly from the arm64 side here, but there's nothing in the
> architecture that precludes offlining CPU 0 and it certainly works on some
> platforms, so I'd be hesitant to rule it out entirely for testing.
> 
> One reason why hotplug can fail in practice is if a trusted OS (i.e. code
> running on the secure side of the fence outside of Linux's view of the
> world) is resident on a core and rejects firmware requests to power it
> off. The PSCI code (drivers/firmware/psci/) should detect this and return
> -EPERM, although earlier in this thread there was mention of -EBUSY so it
> sounds like something else...

We can certainly special-case -EPERM in rcutorture.  But what should we
expect?  Would this be a random encounter with a trusted OS, or should we
expect that a given trusted OS instance would grab a giving CPU long-term?
My guess is the former, but I do feel the need to ask.  ;-)

							Thanx, Paul



More information about the linux-arm-kernel mailing list