ARM11MPcore: tlb_ops_need_broadcast causes deadlock

Will Deacon will.deacon at arm.com
Tue Mar 27 09:32:26 EDT 2012


Peter,

On Mon, Mar 26, 2012 at 05:10:45PM +0100, EXTERNAL Waechtler Peter (Fa. TCP, CM-AI/PJ-CF31) wrote:
> Probably just an "expected deadlock" as mentioned in the comment
> of the v6_early_abort macro:
> 
>  * Purpose : obtain information about current aborted instruction.
>  * Note: we read user space.  This means we might cause a data
>  * abort here if the I-TLB and D-TLB aren't seeing the same
>  * picture.  Unfortunately, this does happen.  We live with it.

I don't see this referring to an expected deadlock.

> For now the errata workarounds are removed for the 11MPcore
> like proposed in this thread to avoid faulting with IRQs turned off:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February/041869.html
> 
> But there it looked like an optimization, but it wasn't.

I have a theory about what goes on:

Say we have a valid (i.e. non-faulting) page which contains a load instruction
that will fault. A CPU executes this load and takes a data abort but at the
same time another CPU marks the page being executed as old. So when the
original CPU tries to load the faulting instruction in do_thumb_abort, we take
a second data abort (assumedly because we don't have a D-side TLB entry for the
text page, so we immediately see that it is old) and, because interrupts were
not yet re-enabled in the first fault, they are not enabled in the nested fault
either.

At this point, the faulting CPU will be unable to get the lock on the page,
since the other guy has it and is waiting for the TLB broadcast to complete.
Given that interrupts are disabled on the faulting CPU, everything locks up.

Possible solutions:

(1) Enable interrupts if they are enabled in the faulting context before
    loading instructions on the dabt path.

(2) Use the FSR to determine whather a fault is due to a read or a write on
    ARMv6 - only load and disassemble the instruction on 1136 CPUs affected
    by erratum #325103 (which aren't SMP, so cannot hit the problem above).

The latter is probably best. Please can you try the patch below? I've
checked that it does the right thing on an r0p1 1136 core using a simple
fork/swp program to trigger a CoW.

Will


diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dfb0312..dedb885 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1163,6 +1163,15 @@ if !MMU
 source "arch/arm/Kconfig-nommu"
 endif
 
+config ARM_ERRATA_326103
+       bool "ARM errata: FSR write bit incorrect on a SWP to read-only memory"
+       depends on CPU_V6
+       help
+         Executing a SWP instruction to read-only memory does not set bit 11
+         of the FSR on the ARM 1136 prior to r1p0. This causes the kernel to
+         treat the access as a read, preventing a COW from occurring and
+         causing the faulting task to livelock.
+
 config ARM_ERRATA_411920
        bool "ARM errata: Invalidation of the Instruction Cache operation can fail"
        depends on CPU_V6 || CPU_V6K
diff --git a/arch/arm/mm/abort-ev6.S b/arch/arm/mm/abort-ev6.S
index ff1f7cc..8074199 100644
--- a/arch/arm/mm/abort-ev6.S
+++ b/arch/arm/mm/abort-ev6.S
@@ -26,18 +26,23 @@ ENTRY(v6_early_abort)
        mrc     p15, 0, r1, c5, c0, 0           @ get FSR
        mrc     p15, 0, r0, c6, c0, 0           @ get FAR
 /*
- * Faulty SWP instruction on 1136 doesn't set bit 11 in DFSR (erratum 326103).
- * The test below covers all the write situations, including Java bytecodes
+ * Faulty SWP instruction on 1136 doesn't set bit 11 in DFSR.
  */
-       bic     r1, r1, #1 << 11                @ clear bit 11 of FSR
+#ifdef CONFIG_ARM_ERRATA_326103
+       ldr     ip, =0x4107b36
+       mrc     p15, 0, r3, c0, c0, 0           @ get processor id
+       teq     ip, r3, lsr #4                  @ r0 ARM1136?
+       bne     do_DataAbort
        tst     r5, #PSR_J_BIT                  @ Java?
+       tsteq   r5, #PSR_T_BIT                  @ Thumb?
        bne     do_DataAbort
-       do_thumb_abort fsr=r1, pc=r4, psr=r5, tmp=r3
-       ldreq   r3, [r4]                        @ read aborted ARM instruction
+       bic     r1, r1, #1 << 11                @ clear bit 11 of FSR
+       ldr     r3, [r4]                        @ read aborted ARM instruction
 #ifdef CONFIG_CPU_ENDIAN_BE8
-       reveq   r3, r3
+       rev     r3, r3
 #endif
        do_ldrd_abort tmp=ip, insn=r3
        tst     r3, #1 << 20                    @ L = 0 -> write
        orreq   r1, r1, #1 << 11                @ yes.
+#endif
        b       do_DataAbort




More information about the linux-arm-kernel mailing list