Getting random Kernel Crash with v3.1-rc8 kernel
Hiremath, Vaibhav
hvaibhav at ti.com
Wed Oct 19 17:25:19 EDT 2011
Thanks,
Vaibhav
> -----Original Message-----
> From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-
> kernel-bounces at lists.infradead.org] On Behalf Of Hiremath, Vaibhav
> Sent: Wednesday, October 19, 2011 8:24 PM
> To: linux-arm-kernel at lists.infradead.org
> Cc: linux at arm.linux.org.uk
> Subject: Getting random Kernel Crash with v3.1-rc8 kernel
>
> Hi,
>
> I am getting random kernel crash, and it always crashes with "Internal
> error: Oops - undefined instruction: 0 [#1]".
> There could be some stack corruption or race condition in the kernel,
> which I am currently debugging on (running out of options).
>
> The funny part is,
> - If I add some line of code (totally unrelated to this crash) and
> crash goes away.
> - With filesystem on NFS, MMC card (non HS card) and NAND, the
> crash
> is very well reproducible (almost 60% success).
> - With ramdisk, the kernel crash is not observed.
> - The default CPU is freq is 600MHz, if I reduce it to 500Mhz,
> crash
> goes away.
>
>
<snip>
> [ 1.594594] CPSW phy found : id is : 0x4dd074
> [ 1.601482] PHY 0:01 not found
> [ 2.130539] Internal error: Oops - undefined instruction: 0 [#1]
> [ 2.136818] Modules linked in:
> [ 2.140017] CPU: 0 Not tainted (3.1.0-rc8-11589-gbb7fe4f-dirty #4)
> [ 2.146838] PC is at 0xc05b71e8
> [ 2.150134] LR is at run_timer_softirq+0xf8/0x208
> [ 2.155050] pc : [<c05b71e8>] lr : [<c0040980>] psr: a0000113
> [ 2.155059] sp : c055be68 ip : c05c96f4 fp : c055beac
> [ 2.167049] r10: c05b680c r9 : 00000100 r8 : 00200200
> [ 2.172506] r7 : c055a000 r6 : ba8c4f60 r5 : c05b6000 r4 : c055be78
> [ 2.179325] r3 : c05b637c r2 : c05b636c r1 : 00000000 r0 : c05b637c
> [ 2.186146] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM
> Segment kernel
> [ 2.193785] Control: 10c5387d Table: 80004019 DAC: 00000015
> [ 2.199779] Process swapper (pid: 0, stack limit = 0xc055a2f0)
> [ 2.205863] Stack: (0xc055be68 to 0xc055c000)
> [ 2.210414] be60: 00000000 00000000 c05b637c c05b7ff8
> c05c96f4 c781de70
> [ 2.218958] be80: c05b7ff4 00000001 c05b5e88 c055a000 00000100 c05b5e40
> c0576f48 c05b5e84
> [ 2.227503] bea0: c055bef4 c055beb0 c003ad54 c0040894 c055bf74 80004059
> 413fc082 00000001
> [ 2.236059] bec0: c0577154 0000000a c0069360 c055a000 00000043 00000000
> c055bf74 80004059
> [ 2.244611] bee0: 413fc082 00000000 c055bf0c c055bef8 c003b1c4 c003acb4
> c006ac78 c0588aa4
> [ 2.253162] bf00: c055bf2c c055bf10 c00145c8 c003b144 c0014714 c0014718
> 60000013 fa200000
> [ 2.261714] bf20: c055bf3c c055bf30 c0008190 c0014590 c055bf94 c055bf40
> c00132f4 c000818c
> [ 2.270270] bf40: 7d2bcf2e 00000000 c055bf88 00000000 c055a000 c05a4744
> c056021c c0560214
> [ 2.278825] bf60: 80004059 413fc082 00000000 c055bf94 c055bf98 c055bf88
> c0014714 c0014718
> [ 2.287378] bf80: 60000013 ffffffff c055bfb4 c055bf98 c00148d8 c00146f8
> 00000000 c055c0ac
> [ 2.295933] bfa0: c054bbec c06d1500 c055bfc4 c055bfb8 c03e07c4 c0014870
> c055bff4 c055bfc8
> [ 2.304483] bfc0: c05237d8 c03e075c c05231ac 00000000 00000000 c054bbec
> 00000000 10c53c7d
> [ 2.313029] bfe0: c055c040 c054bbe8 00000000 c055bff8 8000803c c0523520
> 00000000 00000000
> [ 2.321567] Backtrace:
> [ 2.324146] [<c0040888>] (run_timer_softirq+0x0/0x208) from
> [<c003ad54>] (__do_softirq+0xac/0x134)
> [ 2.333524] [<c003aca8>] (__do_softirq+0x0/0x134) from [<c003b1c4>]
> (irq_exit+0x8c/0xa4)
> [ 2.342004] [<c003b138>] (irq_exit+0x0/0xa4) from [<c00145c8>]
> (handle_IRQ+0x44/0x8c)
> [ 2.350184] r4:c0588aa4 r3:c006ac78
> [ 2.353936] [<c0014584>] (handle_IRQ+0x0/0x8c) from [<c0008190>]
> (asm_do_IRQ+0x10/0x14)
> [ 2.362294] r6:fa200000 r5:60000013 r4:c0014718 r3:c0014714
> [ 2.368234] [<c0008180>] (asm_do_IRQ+0x0/0x14) from [<c00132f4>]
> (__irq_svc+0x34/0x80)
> [ 2.376507] Exception stack(0xc055bf40 to 0xc055bf88)
> [ 2.381789] bf40: 7d2bcf2e 00000000 c055bf88 00000000 c055a000 c05a4744
> c056021c c0560214
> [ 2.390339] bf60: 80004059 413fc082 00000000 c055bf94 c055bf98 c055bf88
> c0014714 c0014718
> [ 2.398874] bf80: 60000013 ffffffff
> [ 2.402522] [<c00146ec>] (default_idle+0x0/0x30) from [<c00148d8>]
> (cpu_idle+0x74/0xa0)
> [ 2.410905] [<c0014864>] (cpu_idle+0x0/0xa0) from [<c03e07c4>]
> (rest_init+0x74/0x78)
> [ 2.418997] r6:c06d1500 r5:c054bbec r4:c055c0ac r3:00000000
> [ 2.424950] [<c03e0750>] (rest_init+0x0/0x78) from [<c05237d8>]
> (start_kernel+0x2c4/0x2d0)
> [ 2.433592] [<c0523514>] (start_kernel+0x0/0x2d0) from [<8000803c>]
> (0x8000803c)
> [ 2.441315] r6:c054bbe8 r5:c055c040 r4:10c53c7d
> [ 2.446156] Code: 00000100 c05b6001 c0049608 c05b70b0 (ffffffff)
> [ 2.452552] ---[ end trace 010eec470f78ac9d ]---
> [ 2.457377] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2.464023] Backtrace:
> [ 2.466602] [<c0016e40>] (dump_backtrace+0x0/0x110) from [<c03e8d88>]
> (dump_stack+0x18/0x1c)
> [ 2.475434] r6:00000001 r5:00000000 r4:c05a5508 r3:c05770c8
> [ 2.481384] [<c03e8d70>] (dump_stack+0x0/0x1c) from [<c03e8df8>]
> (panic+0x6c/0x1a0)
> [ 2.489397] [<c03e8d8c>] (panic+0x0/0x1a0) from [<c0017270>]
> (die+0x268/0x2bc)
> [ 2.496953] r3:00000100 r2:00000000 r1:00000000 r0:c04aba68
> [ 2.502895] r7:c055bd32
> [ 2.505548] [<c0017008>] (die+0x0/0x2bc) from [<c00172e4>]
> (arm_notify_die+0x20/0x58)
> [ 2.513749] [<c00172c4>] (arm_notify_die+0x0/0x58) from [<c00082d0>]
> (do_undefinstr+0x13c/0x154)
> [ 2.522950] [<c0008194>] (do_undefinstr+0x0/0x154) from [<c0013388>]
> (__und_svc+0x48/0x60)
> [ 2.531602] Exception stack(0xc055be20 to 0xc055be68)
> [ 2.536888] be20: c05b637c 00000000 c05b636c c05b637c c055be78 c05b6000
> ba8c4f60 c055a000
> [ 2.545452] be40: 00200200 00000100 c05b680c c055beac c05c96f4 c055be68
> c0040980 c05b71e8
> [ 2.554009] be60: a0000113 ffffffff
> [ 2.557651] r7:00000001 r6:c055a050 r5:a0000113 r4:c05b71ec
> [ 2.563607] [<c0040888>] (run_timer_softirq+0x0/0x208) from
> [<c003ad54>] (__do_softirq+0xac/0x134)
> [ 2.572992] [<c003aca8>] (__do_softirq+0x0/0x134) from [<c003b1c4>]
> (irq_exit+0x8c/0xa4)
> [ 2.581466] [<c003b138>] (irq_exit+0x0/0xa4) from [<c00145c8>]
> (handle_IRQ+0x44/0x8c)
> [ 2.589645] r4:c0588aa4 r3:c006ac78
> [ 2.593404] [<c0014584>] (handle_IRQ+0x0/0x8c) from [<c0008190>]
> (asm_do_IRQ+0x10/0x14)
> [ 2.601780] r6:fa200000 r5:60000013 r4:c0014718 r3:c0014714
> [ 2.607727] [<c0008180>] (asm_do_IRQ+0x0/0x14) from [<c00132f4>]
> (__irq_svc+0x34/0x80)
> [ 2.616014] Exception stack(0xc055bf40 to 0xc055bf88)
> [ 2.621297] bf40: 7d2bcf2e 00000000 c055bf88 00000000 c055a000 c05a4744
> c056021c c0560214
> [ 2.629833] bf60: 80004059 413fc082 00000000 c055bf94 c055bf98 c055bf88
> c0014714 c0014718
> [ 2.638390] bf80: 60000013 ffffffff
> [ 2.642054] [<c00146ec>] (default_idle+0x0/0x30) from [<c00148d8>]
> (cpu_idle+0x74/0xa0)
> [ 2.650432] [<c0014864>] (cpu_idle+0x0/0xa0) from [<c03e07c4>]
> (rest_init+0x74/0x78)
> [ 2.658541] r6:c06d1500 r5:c054bbec r4:c055c0ac r3:00000000
> [ 2.664502] [<c03e0750>] (rest_init+0x0/0x78) from [<c05237d8>]
> (start_kernel+0x2c4/0x2d0)
> [ 2.673164] [<c0523514>] (start_kernel+0x0/0x2d0) from [<8000803c>]
> (0x8000803c)
> [ 2.680909] r6:c054bbe8 r5:c055c040 r4:10c53c7d
>
>
After further debugging I observed that somehow mod_timer is not setting expiry timeout value passed. From USB driver we are trying to set 2S timeout value, which doesn't get configured properly -
Usage of mod_timer -
ret = mod_timer(&otg_workaround, jiffies + msecs_to_jiffies(2000));
Timer expiry log just before and after the mod_timer -
[ 2.133559] otg_timer:603 state - 1
[ 2.137224] otg_timer:659 expires - 4294937708
[ 2.141892] otg_timer:666 expires - 4294937709
Due to this, I observed that, the execution goes crazy in function
"run_timer_softirq", it always stays in the inner loop -
while (!list_empty(head)) {
...
call_timer_fn(timer, fn, data);
...
}
Funny part is, I observed that, when kernel crashes, it would have in contiguous look inside this function and sometime later it crashes.
I am quite not sure why mod_probe is not setting right expiry timeout values, does anybody has any pointers with this?
Just to check, I created dummy driver to test setup_timer/mod_timer/del_timer and they are working fine for me. So I wouldn't expect any issues with timer per se...
Any pointers are greatly appreciated.
> Thanks,
> Vaibhav
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
More information about the linux-arm-kernel
mailing list