Frank Rowand reported: I have a consistent (every boot) hang on boot with the RT patches. With a few hacks to get console output, I get: rcu_preempt_state detected stalls on CPUs/tasks I have also replicated the problem on the ARM RealView (in tree) and without the RT patches. The problem ended up being caused by the allowed cpus mask being set to all possible cpus for the ksoftirqd on the secondary processors. So the RCU softirq was never executing on the secondary cpu. The problem was that ksoftirqd was woken on the secondary processors before the secondary processors were online. This led to allowed cpus being set to all cpus. wake_up_process() try_to_wake_up() select_task_rq() if (... || !cpu_online(cpu)) select_fallback_rq(task_cpu(p), p) ... /* No more Mr. Nice Guy. */ dest_cpu = cpuset_cpus_allowed_fallback(p) do_set_cpus_allowed(p, cpu_possible_mask) # Thus ksoftirqd can now run on any cpu... The reason is that the ARM SMP boot code for the secondary CPUs enables interrupts before the newly brought up CPU is marked online and active. That causes a wakeup of ksoftirqd or a wakeup of any other kernel thread which is affine to the brought up CPU break that threads affinity and therefor being scheduled on already online CPUs. This problem has been observed on x86 before and the only solution is to mark the CPU online and wait for the CPU active bit before the point where interrupts are enabled. This is safe as the percpu timer setup and the calibration code are not part of the critical setup path and the calibration code needs to have interrupts enabled anyway. We cannot schedule away at this point because we are still in the preempt disabled region which is released in cpu_idle(). Reported-and-tested-by: Frank Rowand Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1109071115410.2723@ionos Signed-off-by: Thomas Gleixner --- arch/arm/kernel/smp.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) Index: linux-2.6/arch/arm/kernel/smp.c =================================================================== --- linux-2.6.orig/arch/arm/kernel/smp.c +++ linux-2.6/arch/arm/kernel/smp.c @@ -305,6 +305,18 @@ asmlinkage void __cpuinit secondary_star * Enable local interrupts. */ notify_cpu_starting(cpu); + + /* + * OK, now it's safe to let the boot CPU continue. Wait for + * the CPU migration code to notice that the CPU is online + * before we continue. We need to do that before we enable + * interrupts otherwise a wakeup of a kernel thread affine to + * this CPU might break the affinity and let hell break lose. + */ + set_cpu_online(cpu, true); + while (!cpu_active(cpu)) + cpu_relax(); + local_irq_enable(); local_fiq_enable(); @@ -318,15 +330,6 @@ asmlinkage void __cpuinit secondary_star smp_store_cpu_info(cpu); /* - * OK, now it's safe to let the boot CPU continue. Wait for - * the CPU migration code to notice that the CPU is online - * before we continue. - */ - set_cpu_online(cpu, true); - while (!cpu_active(cpu)) - cpu_relax(); - - /* * OK, it's off to the idle thread for us */ cpu_idle();