[RESEND PATCH 0/1] Fix the "hard LOCKUP" when running a heavy loading

Will Deacon will.deacon at arm.com
Tue Nov 3 03:30:30 PST 2015


On Tue, Nov 03, 2015 at 04:10:08PM +0800, Caesar Wang wrote:
> As the following log:
> where we experience a CPU hard lockup. The assembly code (disassembled by gdb)
> 
> 0xc06c6e90 <__tcp_select_window+148>:        beq     0xc06c6eb0<__tcp_select_window+180>
> 0xc06c6e94 <__tcp_select_window+152>:        mov     r2, #1008; 0x3f0
> 0xc06c6e98 <__tcp_select_window+156>:        ldr     r5, [r0,#1004] ; 0x3ec
> 0xc06c6e9c <__tcp_select_window+160>:        ldrh    r2, [r0,r2]
> ....
> 
> 0xc06c6ee0 <__tcp_select_window+228>:        addne   r0, r0, #1
> 0xc06c6ee4 <__tcp_select_window+232>:        lslne   r0, r0, r2
> 0xc06c6ee8 <__tcp_select_window+236>:        ldmne   sp, {r4, r5,r11, sp,pc}
> 
> Could either the “strhi”/”strlo” pair, or the lslne/ldmne pair, be
> tripping over errata 818325, or a similar errata?

No. One of the conditions for #818325 is:

  The second instruction is an UNPREDICTABLE STR or STM (maximum two2
  registers in the list) with write-back and the write-back register is
  in the list of stored registers.

I don't see either of those in your code snippet above, but then I don't
see your strhi/strlo either. What's going on?

> 0xc06c6eec <__tcp_select_window+240>:        b       0xc06c6f40<__tcp_select_window+324>
> 
> This is patch can fix the *hard lock* in some case.
> 
> As the Russell said:
> "in other words, which can be handled by updating a control register in the firmware or
> boot loader"

Russell is completely correct: this should be worked around in firmware.
There are a number of reasons for that:

  (1) You want the workaround enabled for all privilege and security
      levels, which means applying it before you enter the kernel.

  (2) If Linux boots in non-secure, then the workaround may silently
      fail to apply.

  (3) The CPU may have an ECO fix, in which case we wouldn't want to
      enable the workaround.

  (4) Some workarounds (albeit not this one, afaict) require changing
      CPU configuration that can only be done very early on, e.g. whilst
      "the memory system is idle".

Now, I appreciate that doing this in the kernel may be the easiest thing
for your particular SoC, but that doesn't necessarily mean that it's the
best thing to do in the mainline kernel. Whilst there *is* precedent for
this already, we've been trying to move away from setting these bits in
the kernel for the reasons mentioned above.

Will



More information about the linux-arm-kernel mailing list