AW: AW: AW: UBI leb_write_unlock NULL pointer Oops (continuation)

Thu Feb 20 10:21:39 EST 2014

Hi,

I'm back again now.

> Bill Pringlemeir wrote:
>
> $ printf "\x04\x70\x8a\xe4\x04\x50\x98\xe5\x05\x00\x5a\xe1\x29\x00\x00\x0a\x0c\x30\x95\xe5" > crash.dump $ objdump --disassemble-all -m arm -b binary crash.dump 
>
> crash.dump:     file format binary
>
>
> Disassembly of section .data:
>
> 00000000 <.data>:
>  0:   e48a7004        str     r7, [sl], #4
>    4:   e5985004        ldr     r5, [r8, #4]
>    8:   e15a0005        cmp     sl, r5
>    c:   0a000029        beq     0xb8
>   10:   e595300c        ldr     r3, [r5, #12]
>
> 'r5' is NULL.  It seems to be the same symptom.  If you run your ARM objdump with -S on either vmlinux or '__up_write', it will help confirm that it is the list corrupted again.  The assembler above should match.

I don't have running a objdump on my ARM system at the moment, but rwsem-spinlock.c compiled with debug info,  objdump -S -D gives for __up_write():

...
	sem->activity = 0;
 29c:	e3a07000 	mov	r7, #0
 2a0:	e1a0a008 	mov	sl, r8

 2a4:	e48a7004 	str	r7, [sl], #4
 2a8:	e5985004 	ldr	r5, [r8, #4]
	if (!list_empty(&sem->wait_list))
 2ac:	e15a0005 	cmp	sl, r5
 2b0:	0a000029 	beq	35c <__up_write+0xe0>
	/* if we are allowed to wake writers try to grant a single write lock
	 * if there's a writer at the front of the queue
	 * - we leave the 'waiting count' incremented to signify potential
	 *   contention
	 */
	if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
 2b4:	e595300c 	ldr	r3, [r5, #12]
{
...

Seems to match ...

> What is 'RAVENNA_streame'?  Is this your standard test and not the '8k binary' copy test or are you doing the copy test with this process also running?

This is an application which runs parallel to our copy test. The last days, Emanuel set up another test environment which seems to reproduce the error more reliably (at least on some hardwares, not on all).
At the moment, there are running proprietary applications in parallel, but I'll try to strip it down to a sequence which I can provide you, if you like.

> We have 'IRQs off', which makes sense for __up_write.  Trying 'ftrace_dump_on_oops' as Richard suggests would be helpful to find out what went on before.  It might also make sense to dump some 'rwsem_waiter' nodes on the error?  It looks like '__up_write' might normally have an empty list?  > Certainly an non-empty 'rwsem_waiter' is going to trigger the condition more often?  I guess I can look to see what might cause this, even if I can not reproduce it.  The 'preemp_count' has been two every time you have this; is that true?

We could reproduce the error now with function tracing enabled, so we have two hopefully valuable traces. But they are rather big (around 4MB each). Shall I use pastebin and cut them in several peaces to provide them? Or off-list as email attachment?
The trace Emanuel posted Wednesday may be not valuable. Perhaps there is a (different) error triggered due to memory pressure caused by the function tracing.

Best regards,
Thorsten