i.MX31 kernel panic and irq

Russell King - ARM Linux linux at arm.linux.org.uk
Wed Oct 7 09:20:51 EDT 2009


On Tue, Oct 06, 2009 at 10:43:26AM -0500, Bill Gatliff wrote:
> The OOPS messages suggest that the machine has run off into stuff that  
> isn't code, which would be consistent with the stack pointer getting  
> blown out of the stack memory.

I don't follow your line of reasoning.  The oops dump was:

Unable to handle kernel paging request at virtual address 60000013
pgd = c0004000
[60000013] *pgd=00000000
Internal error: Oops: 5 [#1]
Modules linked in: test_drv
CPU: 0    Tainted: G        W   (2.6.31-mx31-spi #29)
PC is at cpu_idle+0x28/0x88
LR is at cpu_idle+0x74/0x88
pc : [<c00281e4>]    lr : [<c0028230>]    psr: 40000093
sp : c0339fc8  ip : 80000093  fp : 00000000
r10: 80020a40  r9 : 4107b364  r8 : 80020a74
r7 : c033c360  r6 : c033c36c  r5 : 60000013  r4 : c0028308
r3 : f1080080  r2 : 00000002  r1 : c03599ac  r0 : 00000009
Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 00c5387d  Table: 8fa10000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc0338268)
Stack: (0xc0339fc8 to 0xc033a000)
9fc0:                   c037c99c c0357ad0 c0022e10 c00089a8 c0008350 00000000
9fe0: 00000000 c0022e10 00c5387d c0357b40 c0023214 80008034 00000000 00000000
[<c00281e4>] (cpu_idle+0x28/0x88) from [<c00089a8>] (start_kernel+0x1f0/0x2cc)
[<c00089a8>] (start_kernel+0x1f0/0x2cc) from [<80008034>] (0x80008034)
Code: e5943000 e3130002 1a000007 f10c0080 (e5953000)

If we look at this, we can see the following:

1. sp is pointing inside the kernel's direct mapped memory, as it should.
2. it is on an odd-number of pages, which means there's potentially more
   than 4K of space available to the stack.  Plus it's above the stack
   limit.
3. the process name is correct.  This is significant, because it means
   that (sp & ~0x1fff) ends up pointing at a valid thread_info structure,
   which then points at a valid task_struct structure.
4. the stack trace is consistent with pid 0's trace, which is basically
   the kernel boot and idle thread - in other words, it hasn't been
   overwritten by something running down into this page.

To me, it looks like somehow r5 got spuriously corrupted - I think it
should be a pointer to 'hlt_counter', but for some reason it's a PSR
value.

   0:	e5943000 	ldr	r3, [r4]
   4:	e3130002 	tst	r3, #2	; 0x2
   8:	1a000007 	bne	0x2c
   c:	f10c0080 	cpsid	i
  10:	e5953000 	ldr	r3, [r5]

which corresponds to:

                while (!need_resched()) {
                        local_irq_disable();
                        if (hlt_counter) { <== faulting

The question, therefore, is why r5 would be corrupted.



More information about the linux-arm-kernel mailing list