arm64 torture test hotplug failures (offlining causes -EBUSY)

Joel Fernandes joel at joelfernandes.org
Wed Jan 18 14:01:07 PST 2023


Hey Will,

On Wed, Jan 18, 2023 at 4:51 PM Will Deacon <will at kernel.org> wrote:
>
> On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 18, 2023 at 02:17:06AM +0000, Joel Fernandes wrote:
> >
> > I would be happier to forgive failure to offline housekeeping CPUs than
> > blanket forgiveness of CPU 0.  Especially given that I recently got
> > burned by a non-zero boot cpu.  ;-)
> >
> > But wouldn't it be even better for cpu_is_hotpluggable() to know the
> > NO_HZ_FULL rules of the road?
> >
> > > Adding Frederic to CC as well as we are talking about
> > > housekeeping/isolation stuff.
> >
> > But as you say, perhaps Frederic has a better idea.
> >
> > > > And topology_init() sets this based on platform_can_hotplug_cpu(cpu).
> > > > And this function sets CPU 0 as !cpu_is_hotpluggable() unless the
> > > > architecture specifies a .cpu_can_disable() function.
> > >
> > > Ah, that is 32-bit ARM code only. This issue is on 64-bit ARM (arch/arm64/).
> >
> > Apologies!  I will look more carefully at the pathnames next time!
> >
> > But maybe arm64 needs something similar?
>
> Just chiming quickly from the arm64 side here, but there's nothing in the
> architecture that precludes offlining CPU 0 and it certainly works on some
> platforms, so I'd be hesitant to rule it out entirely for testing.
>
> One reason why hotplug can fail in practice is if a trusted OS (i.e. code
> running on the secure side of the fence outside of Linux's view of the
> world) is resident on a core and rejects firmware requests to power it
> off. The PSCI code (drivers/firmware/psci/) should detect this and return
> -EPERM, although earlier in this thread there was mention of -EBUSY so it
> sounds like something else...

Thank you for the heads up on that. To give you context, I am
currently testing rcutorture on stable kernels 5.10, 5.15, 6.1 on my
ARM64 QC7180 board. I certainly don't want to hit the -EPERM in the
future on this or other ARM64 hardware. It would be great if
cpu_psci_cpu_can_disable() in arm64 can return false if hotplugging
causes -EPERM indefinitely. Then we do not need to make any changes.
This is similar to the idea Paul mentioned in an earlier thread where
the ARCH can disable the hotplug and make it clear the CPU removal is
off limits.

Meanwhile, I am also looking into whether we can make the housekeeping
CPU (returning -EBUSY) offlining be encoded somehow in the
cpu_is_hotpluggable() logic (also an idea from Paul). That appears to
not be arch code related though.

Thanks,

- Joel



More information about the linux-arm-kernel mailing list