[patch] ARM: smpboot: Enable interrupts after marking CPU online/active

Stepan Moskovchenko stepanm at codeaurora.org
Tue Nov 15 16:54:27 EST 2011


On 11:59 AM, Dima Zavin wrote:
> <go go gadget message retractor>
>
> Looks like the issue ended up being the fact that the caller was
> accessing the cpu_online_mask without calling get_online_cpus anywhere
> in the path and could have picked up the the intermediate state where
> CPU was online but not yet active..
>
> --Dima
>
> On Wed, Oct 19, 2011 at 2:16 PM, Dima Zavin<dima at android.com>  wrote:
>> Thomas,
>>
>> I was seeing something similar, and this patch did address part of it,
>> but I still think we have a problem. What I am seeing is the
>> following:
>>
>> We have a SCHED_FIFO kernel thread which tries to do a blocking IPI
>> (smp_call_function_single) to all online cpus (for_each_online_cpu).
>> This introduces a deadlock where the FIFO thread could be preempting
>> the suspend thread which is the one that's trying to set the second
>> CPU active after it sees it going online. The second CPU marked itself
>> online and is then waiting for the first CPU to mark it active before
>> enabling irqs to service the IPI.
>>
>> It feels to me like we should probably not be sending blocking IPIs
>> from a FIFO thread for sanity reasons, but that does mean there's
>> still a deadlock present here.
>>
>> Thoughts?
>>
>> Thanks in advance.
>>
>> --Dima
>>

Hello

I am seeing a deadlock when executing hotplug operations with this patch
applied. When the secondary CPU gets brought up in _cpu_up, the cpu is 
turned on
and then the online notifier gets called, which is what marks the 
secondary CPU
as active. If _cpu_up on the primary CPU is preempted before the 
secondary CPU
is marked active, it is possible that the primary CPU will want to call
smp_call_function (or send an IPI) to the secondary CPU because it is marked
online. However, with this patch, the secondary CPU is still spinning on
!cpu_active(cpu)
with interrupts disabled. So, the primary CPU is now stuck in 
csd_lock_wait(),
waiting for the secondary CPU to respond, while the secondary CPU spins with
interrupts disabled, waiting for the primary CPU to mark it as active. 
So, while
your approach to not call smp_function_single may work for you in your 
specific
case, I believe there is still a problem in the general case.

One suggestion for resolving this might be making smp_call_function look 
at the
active CPUs rather than online CPUs, or to just let the secondary CPU mark
itself as active rather than having the primary CPU do this, though this 
might
defeat the original intended purpose of the active mask.

Steve



More information about the linux-arm-kernel mailing list