console_cpu_notify can cause scheduling BUG during CPU hotplug

Michael Bohan mbohan at codeaurora.org
Wed Apr 27 18:12:19 EDT 2011


On 4/27/2011 12:38 AM, Borislav Petkov wrote:
> Great, whatever you guys come up with, we'd like to give it a run too.
> We (AMD) hit the same issue in one of our tests but in our case we end
> up in an endless loop of the state machine at stop_machine_cpu_stop()
> since the core being offlined cannot ack the state transition to
> STOPMACHINE_EXIT due to a similar reason.
>
> One possible fix is dropping CPU_DYING from console_cpu_notify()
> since it is called into by the offlining path in
> kernel/cpu.c::take_cpu_down().

This seems to be a different problem. Could you elaborate about why 
removing CPU_DYING from console_cpu_notify resolves your problem? What 
are other possible fixes?

In the failure case I witnessed, we're attempting to sleep in atomic 
mode, which is a clear violation caused by the addition of CPU_DYING. I 
haven't thoroughly investigated whether other actions in 
console_cpu_notify (eg. ONLINE, DEAD, DOWN_FAILED, UP_CANCELED) are in 
atomic mode violation as well.

Thanks,
Mike

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum



More information about the linux-arm-kernel mailing list