ath9k ARMv7 OOPS in v4.8.6, v4.2.8
Russell King - ARM Linux
linux at armlinux.org.uk
Wed Nov 23 11:51:20 PST 2016
On Wed, Nov 23, 2016 at 07:15:39PM +0000, Jason Cooper wrote:
> ------- oops from v4.8.6 #2 ------------------------------------------
> [42059.303625] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [42059.311799] pgd = c0004000
> [42059.314522] [00000020] *pgd=00000000
> [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> [42059.340613] task: c0b091c0 task.stack: c0b00000
> [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> [42059.357598] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
> [42059.357598] sp : c0b01cd0 ip : 00000000 fp : 00000000
> [42059.369127] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
> [42059.374374] r7 : 00000000 r6 : dcfbd340 r5 : c0b03da0 r4 : 00000000
> [42059.380930] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
Well, the good news is that it's reproducable.
It looks like it could be this:
static int
ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
{
for_each_online_cpu(i)
ret += relay_buf_full(rc->buf[i]);
where i = 8 (r2) and rc->buf is r7. That's just a guess though, as
there's precious little to go on with the Code: line - modern GCCs
don't give us much with the Code: line anymore to figure out what's
going on without the exact object files.
e5933000 ldr r3, [r3]
e1d330b4 ldrh r3, [r3, #4]
e58d3030 str r3, [sp, #48] ; 0x30
ea000002 b 1c <foo+0x1c>
e7970102 ldr r0, [r7, r2, lsl #2]
What makes me wonder though is that if i=8, that means you must have a
system with 9 online CPUs, which is probably unlikely - or maybe that's
the problem, for_each_online_cpu() is going wrong...
If it's not that line of code, I don't see what else it would be based
on the output of my compiler - there's only one case in my disassembly
that corresponds with the single code line that we have to go on, and
it's this:
a44: e5983020 ldr r3, [r8, #32]
a48: e793010a ldr r0, [r3, sl, lsl #2] <===
a4c: ebfffffe bl 0 <relay_buf_full>
a50: e0844000 add r4, r4, r0
a54: e59f9434 ldr r9, [pc, #1076]
a58: e28a2001 add r2, sl, #1
a5c: e3a01004 mov r1, #4
a60: e1a00009 mov r0, r9
a64: ebfffffe bl 0 <_find_next_bit_le>
a68: e5953000 ldr r3, [r5]
a6c: e1500003 cmp r0, r3
a70: e1a0a000 mov sl, r0
a74: bafffff2 blt a44 <ath_cmn_process_fft+0xa8>
I'm debating now about whether we need to dump more of the code in the
oops - both before and after the faulting instruction...
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
More information about the linux-arm-kernel
mailing list