SMP soft lockup on smp_call_function_many when doing flush_tlb_page
Catalin Marinas
catalin.marinas at arm.com
Tue Mar 8 08:33:49 EST 2011
On Tue, 2011-03-08 at 09:53 +0000, saeed bishara wrote:
> The lockup below happens on my SMP system that doesn't support hw
> tlb broadcast.
> after some debug I found that the mask inside
> smp_call_function_many() which points to mm_cpumask, get changed
> asynchronously, apparently by reset_context() that called from IPI
> that was issued by another cpu.
> when I disable interrupts in smp_call_function_many around the
> code that uses the mask, the issue disappears.
> also, reverting the patch "ARM: 5905/1: ARM: Global ASID
> allocation on SMP" eliminates this specific bug.
It looks like smp_call_function_many() does all the checks on the mask
argument that it received and decides that there are CPUs to call (fair
enough). But in the meantime reset_context() via IPI clears this mask
leaving only the current CPU. Once the IPI was handled,
smp_call_function_many() copies this mask to data->cpumask and clears
the current CPU leaving an empty mask. It then waits for the other
(none) CPUs to clear the csd lock which would never happen as data->refs
is 0.
There are probably a few ways to fix this. My preferred method is to
modify the generic code (I'll add a proper commit log later and post it
on LKML if it works):
diff --git a/kernel/smp.c b/kernel/smp.c
index 9910744..a79454f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -499,6 +499,10 @@ void smp_call_function_many(const struct cpumask *mask,
smp_wmb();
atomic_set(&data->refs, cpumask_weight(data->cpumask));
+ if (unlikely(!atomic_read(&data->refs))) {
+ csd_unlock(&data->csd);
+ return;
+ }
raw_spin_lock_irqsave(&call_function.lock, flags);
/*
An alternative would be to copy the cpumask to a local variable in
on_each_cpu_mask(), though the workaround above would cover other cases
that we haven't spotted yet. Also, the smp_call_function_many()
description doesn't state that the cpumask should not be modified.
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 8f57f32..1717dec 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -16,10 +16,13 @@
static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
const struct cpumask *mask)
{
+ struct cpumask call_mask;
+
preempt_disable();
+ cpumask_copy(&call_mask, mask);
smp_call_function_many(mask, func, info, wait);
- if (cpumask_test_cpu(smp_processor_id(), mask))
+ if (cpumask_test_cpu(smp_processor_id(), &call_mask))
func(info);
preempt_enable();
--
Catalin
More information about the linux-arm-kernel
mailing list