arm64 torture test hotplug failures (offlining causes -EBUSY)
Paul E. McKenney
paulmck at kernel.org
Wed Jan 18 09:56:35 PST 2023
On Wed, Jan 18, 2023 at 04:51:22PM +0000, Will Deacon wrote:
> On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 18, 2023 at 02:17:06AM +0000, Joel Fernandes wrote:
> >
> > I would be happier to forgive failure to offline housekeeping CPUs than
> > blanket forgiveness of CPU 0. Especially given that I recently got
> > burned by a non-zero boot cpu. ;-)
> >
> > But wouldn't it be even better for cpu_is_hotpluggable() to know the
> > NO_HZ_FULL rules of the road?
> >
> > > Adding Frederic to CC as well as we are talking about
> > > housekeeping/isolation stuff.
> >
> > But as you say, perhaps Frederic has a better idea.
> >
> > > > And topology_init() sets this based on platform_can_hotplug_cpu(cpu).
> > > > And this function sets CPU 0 as !cpu_is_hotpluggable() unless the
> > > > architecture specifies a .cpu_can_disable() function.
> > >
> > > Ah, that is 32-bit ARM code only. This issue is on 64-bit ARM (arch/arm64/).
> >
> > Apologies! I will look more carefully at the pathnames next time!
> >
> > But maybe arm64 needs something similar?
>
> Just chiming quickly from the arm64 side here, but there's nothing in the
> architecture that precludes offlining CPU 0 and it certainly works on some
> platforms, so I'd be hesitant to rule it out entirely for testing.
>
> One reason why hotplug can fail in practice is if a trusted OS (i.e. code
> running on the secure side of the fence outside of Linux's view of the
> world) is resident on a core and rejects firmware requests to power it
> off. The PSCI code (drivers/firmware/psci/) should detect this and return
> -EPERM, although earlier in this thread there was mention of -EBUSY so it
> sounds like something else...
We can certainly special-case -EPERM in rcutorture. But what should we
expect? Would this be a random encounter with a trusted OS, or should we
expect that a given trusted OS instance would grab a giving CPU long-term?
My guess is the former, but I do feel the need to ask. ;-)
Thanx, Paul
More information about the linux-arm-kernel
mailing list