ARM11MPcore: tlb_ops_need_broadcast causes deadlock

Will Deacon will.deacon at arm.com
Wed Mar 28 04:56:11 EDT 2012


On Tue, Mar 27, 2012 at 06:41:52PM +0100, George G. Davis wrote:
> On Tue, Mar 27, 2012 at 02:32:26PM +0100, Will Deacon wrote:
> > I have a theory about what goes on:
> > 
> > Say we have a valid (i.e. non-faulting) page which contains a load instruction
> > that will fault. A CPU executes this load and takes a data abort but at the
> > same time another CPU marks the page being executed as old. So when the
> > original CPU tries to load the faulting instruction in do_thumb_abort, we take
> > a second data abort (assumedly because we don't have a D-side TLB entry for the
> > text page, so we immediately see that it is old) and, because interrupts were
> > not yet re-enabled in the first fault, they are not enabled in the nested fault
> > either.
> 
> This is precisely what happened here.  The only difference is that the traces
> I've reviewed faulted at "not_thumb:" while attempting to read the userspace
> ARM instruction which lead to the (second) data abort with interrupts disabled.

Right, I think that's the same problem though.

> > Possible solutions:
> > 
> > (1) Enable interrupts if they are enabled in the faulting context before
> >     loading instructions on the dabt path.
> > 
> > (2) Use the FSR to determine whather a fault is due to a read or a write on
> >     ARMv6 - only load and disassemble the instruction on 1136 CPUs affected
> >     by erratum #325103 (which aren't SMP, so cannot hit the problem above).
> 
> We submitted a change similar to (2) above to the ARM Linux kernel mailing
> list for RFC [1] over a year ago.  That change [1] is similar to your change
> below.

Apologies, I missed that. Are you happy for me to continue with my change
below? I'd really like it if Peter could confirm it fixes his problem.

> > The latter is probably best. Please can you try the patch below? I've
> > checked that it does the right thing on an r0p1 1136 core using a simple
> > fork/swp program to trigger a CoW.
> 
> I tested your patch but only on a CPU_V6K based SMP machine.  In this
> case, ARM_ERRATA_326103 depends on CPU_V6, so is left disabled, renderring
> this patch functionally equivaltent to [1] below.

Thanks George. Do you have a testcase for reliably reproducing the deadlock
without this patch applied?

> FYI/FWIW, your patch above suffered whitespace damage.

That'll be the sorry excuse for an email system that I'm forced to use.
Perhaps the list archive has a better version:

http://lists.arm.linux.org.uk/lurker/attach/1@20120327.133226.639a8b79.attach

Will



More information about the linux-arm-kernel mailing list