Undefined instruction (ldrshtgt?) on mirabox with 3.11-rc7

Russell King - ARM Linux linux at arm.linux.org.uk
Sat Aug 31 19:54:34 EDT 2013


On Sat, Aug 31, 2013 at 07:00:29PM -0400, Jochen De Smet wrote:
> On 8/31/2013 16:06, Russell King - ARM Linux wrote:
>> Neither call quirk_usb_early_handoff().  I'm going to assume that it's
>> the EHCI one.
> Curiously enough, I don't see either one (ehci-q.c or fusbh200-hcd.c) in  
> the kernel "make" output.
> Ah, ehci-q gets directly included by ehci-hcd.c, which I do see. Don't  
> see anything similar for fusbh200
> or oxu210hp-hcd.c, so I'm pretty sure the EHCI one is the only one I'm  
> compiling and your guess is
> right.

Thanks for confirming.

> (gdb) disassemble /r start_unlink_async
> Dump of assembler code for function start_unlink_async:
>    0xc020bff4 <+0>:     0d c0 a0 e1     mov     r12, sp
>    0xc020bff8 <+4>:     18 d8 2d e9     push    {r3, r4, r11, r12, lr, pc}
>    0xc020bffc <+8>:     04 b0 4c e2     sub     r11, r12, #4
>    0xc020c000 <+12>:    2c 30 d1 e5     ldrb    r3, [r1, #44]   ; 0x2c
>    0xc020c004 <+16>:    00 40 a0 e1     mov     r4, r0
>    0xc020c008 <+20>:    01 00 53 e3     cmp     r3, #1
>    0xc020c00c <+24>:    18 a8 9d 18     ldmne   sp, {r3, r4, r11, sp, pc}
>    0xc020c010 <+28>:    ca f1 ff eb     bl      0xc0208740  
> <single_unlink_async>
>    0xc020c014 <+32>:    04 00 a0 e1     mov     r0, r4
>    0xc020c018 <+36>:    40 ff ff eb     bl      0xc020bd20  
> <start_iaa_cycle>
>    0xc020c01c <+40>:    18 a8 9d e8     ldm     sp, {r3, r4, r11, sp, pc}
> End of assembler dump.

Okay, so 0xc020c014 is the location of interest, and it's immediately after
a branch to single_unlink_async().  Okay, that confirms that the suspected
path is valid, and we did enter single_unlink_async from the correct place
in the code.

> disassemble /m  doesn't seem to work for this; is that normal?

Hmm, disassemble /m... I'm not up with gdb I'm afraid.

> (gdb) disassemble single_unlink_async
> Dump of assembler code for function single_unlink_async:
>    0xc0208740 <+0>:     mov     r12, sp
>    0xc0208744 <+4>:     push    {r11, r12, lr, pc}
>    0xc0208748 <+8>:     sub     r11, r12, #4
>    0xc020874c <+12>:    mov     r3, #4
>    0xc0208750 <+16>:    strb    r3, [r1, #44]   ; 0x2c
>    0xc0208754 <+20>:    ldr     r3, [r0, #212]  ; 0xd4
>    0xc0208758 <+24>:    add     r2, r1, #32
>    0xc020875c <+28>:    add     r12, r0, #208   ; 0xd0
>    0xc0208760 <+32>:    str     r2, [r0, #212]  ; 0xd4
>    0xc0208764 <+36>:    str     r12, [r1, #32]
>    0xc0208768 <+40>:    str     r3, [r1, #36]   ; 0x24
>    0xc020876c <+44>:    str     r2, [r3]
>    0xc0208770 <+48>:    ldr     r2, [r0, #200]  ; 0xc8
>    0xc0208774 <+52>:    b       0xc020877c <single_unlink_async+60>
>    0xc0208778 <+56>:    mov     r2, r3
>    0xc020877c <+60>:    ldr     r3, [r2, #8]
>    0xc0208780 <+64>:    cmp     r3, r1
>    0xc0208784 <+68>:    bne     0xc0208778 <single_unlink_async+56>

Okay.  First, here's the stack from your previous post, annotated with
the saved registers:

fd80:                                                       c03efdbc c03efda8
                                                            r11      r12
fda0: c020c014 c020874c ef2735d0 ef273500 c03efdd4 c03efdc0 c020c0e0 c020c000
      lr       pc       r3       r4       r11      r12      lr       pc

Unfortunately, this don't really provide much in the way of useful
information other than confirming that the stack layout is as we'd
expect it to be if we got into this function.

Let's now look at the register state:

pc : [<c020837c>]    lr : [<c020c014>] psr: 00000193
sp : c03efd98  ip : ef2735d0  fp : c03efda4
r10: 60000193  r9 : 00000006  r8 : c03013ec
r7 : 000031ac  r6 : d77d6a38  r5 : 00000001  r4 : 00000ef4
r3 : ee817c00  r2 : ef2de8c0  r1 : ee804600  r0 : ef273500

The trick here is to pull out what this tells us based on the code from
the above function.  The first thing to note is that the sp/fp values
are correct: the fp points at the saved PC for this stack frame, which
is what I'd expect.  (Because of prefetching, the saved PC will be ahead
of the instruction which saved it.)

The second thing to note is this:

ip (ef2735d0) = r0 (ef273500) + 0xd0

That suggests that the instruction at 0xc020875c was executed, which is
fair confirmation that we made it into this function and got that far.
Unfortunately, we can't tell much else from comparing the registers and
this code.  Let's look at the code where we ended up:

c0208374:   c0406068        subgt   r6, r0, r8, rrx
c0208378:   c0394a20        eorsgt  r4, r9, r0, lsr #20
c020837c:   c03949f0        ldrshtgt        r4, [r9], -r0

I've annotated this with the correct address from your previous report.
An important thing to note here is that the PSR flags are zero (NZCV
are all clear) so the 'gt' condition will allow these instructions to
execute.

So, can we deduce anything from this?  Well, we have this:

r4 (00000ef4) = r9 (00000006) ^ (r0 (ef273500) >> 20)

so it looks like the instruction at c0208378 was executed.  Obviously
the instruction at c020837c caused a fault, so that was definitely
executed.  What about c0208374?

r6 (d77d6a38) != r0 (ef273500) ^ (r8 (c03013ec) rrx) (rotate right with
extend - a 33 bit right rotate).

That doesn't work, so it suggests that the instruction at c0208374 wasn't
executed.

Now.  How can we get from the above function to c0208378?  Nothing in
this function does a call through pointer, and we certainly haven't
loaded anything off the stack.  Did the PC just spontaneously jump
there?  I think not, but there are two branches in the above code.

There is this:

>    0xc0208784 <+68>:    bne     0xc0208778 <single_unlink_async+56>  

Notice the destination addresses similarity to the address of the first
instruction we think was executed - 0xc0208778 vs c0208378.  Here's
the instruction opcodes for branches to those two locations:

c02107d8:	1affdfe6 	bne	c0208778
c02107d8:	1affdee6 	bne	c0208378

See the single bit difference there on bit 8?

So, this is what I think: either _something_ has cleared that bit, or
you have a problem with your SDRAM wiring, or your SDRAM containing
this location is going bad and is suffering from a bit error at this
location.

I'm afraid that I think you have a hardware problem.



More information about the linux-arm-kernel mailing list