[PATCH] ARM: call disable_nonboot_cpus() from machine_shutdown()

Eric W. Biederman ebiederm at xmission.com
Fri Jan 11 00:59:35 EST 2013

Russell King - ARM Linux <linux at arm.linux.org.uk> writes:

> On Sun, Jan 06, 2013 at 05:53:30PM -0800, Eric W. Biederman wrote:
>> I have cleaned up the mess that is the reboot path once a bunch of years
>> ago, and apparently it is deteriorating again.
> Unfortunately, that's what happens when lots of cooks get their fingers
> in a pie with no coordination.  This is why having maintainers is soo
> important for code - a good maintainer ensures that the code remains
> high quality by whatever means.
> Code without maintainers is subject to modification in all kinds of
> random ways, including duplicating code amongst architectures which
> should be generic code - that happens because no one wants to understand
> what other architectures require.

I don't think you meant to but I have been rather strongly insulted.
With what time I can find I have been maintaining kexec.  Which I admit
has mostly been telling people don't do that silly stupid thing.

> Your original cleanups, afaik, are still all intact.  The problem is
> that since then, kexec has come along, and invented a new callback
> (machine_shutdown) which gets called while the system is still live
> in full SMP mode (and presumably all the devices are still running
> and potentially scribbling over memory with their DMA.)

The devices should not be running. The sequence of code is.

The reboot notifier and all of the device shutdown methods have been 
called in kernel_restart_prepare().  If drivers don't stop dma
in their shutdown method that is a driver bug.  I wish we could
use the remove method but I lost that fight a decade ago.

It is exactly the same sequence we use for a normal reboot
except for the the lack of disable_nonboot_cpus().

> Meanwhile
> kexec wants to pass control to the new kernel... that doesn't sound
> particularly clever to me.

Because I was annoyed I just stopped and between faded memories and
reading the kernel logs I think I can reconstruct how
disable_nonboot_cpus() got into the places it has gotten.

Ages and ages ago (the logs say Oct 2007) disable_nonboot_cpus() stopped
being a pure cpu-hotplug and power management thing and started being
called from kernel_power_off.  I objected to it's inclusion on the
reboot path but there was some deep magic ACPI ordering mess that
necessitated it be called in kernel_power_off.

In retrospect that disable_nonboot_cpus() call should have been x86

As I recall disable_nonboot_cpus() at the time was a cpu hotplug hack
and reading through the code again tonight it appears that
disable_noboot_cpus() continues to be a messed up cpu hotplug hack.

disable_nonboot_cpus() should really be called

If that code is going to be something other than power management
specific it is not cool that disable_nonboot_cpus() is not always
enabled when SMP is enabled.  It means that architectures need to
implement two different ways of shutting down cpus.

One of the truly nasty things about cpu_hotplug is that it requires that
irqs be migrated away from a cpu with interrupts disabled, which at
least on x86 in some interrupt delivery modes is impossible to do
safely.  The only way to losslessly (and without wedging irq
controllers) in those interrupt delivery modes (needed for more than 8
cpus) is to migrate an irq in it's irq handler.  Which is fine for
setting /proc/irq/$N/smp_affinity but is useless for cpu hot-unplug,
where we need to guarantee that all irqs are going to stop hitting a

Now sometimes_dangerously_hotunplug_all_but_one_cpu() on the reboot path 
was added just a few months ago in Oct 2012, and it appears to due to
weird ARM maintainership.  At least the x86 reboot_cpu_id option is
broken due to that addition.   The unlikely but very real ways to wedge
your cpu during a reboot still remain in those code paths and I expect
we simply have not seen regressions yet as there has not been enough
time for the regressions to be noticed and tracked down.

Now given that the inconsistences were caused by crazy ARM
maintainership and that disable_nonboot_cpus() is still crazy enough
to make me scared thinking about the code.

We should remove disable_nonboot_cpus() from the reboot path.   It is
still a crazy unmaintained cpu hotplug mess.

Move disable_nonboot_cpus() into machine_shutdown in the ARM reboot path
if you want to rely on that crazy piece of code.  Or possibly you can
write something ARM specific that requires less work and is safer.


More information about the linux-arm-kernel mailing list