[BUG] CONFIG_UNINLINE_SPIN_UNLOCK important for Cortex-A9
Arnd Bergmann
arnd at arndb.de
Tue May 31 06:12:30 PDT 2016
On Tuesday, May 31, 2016 1:16:40 PM CEST Russell King - ARM Linux wrote:
>
> > > [17827.766279] pgd = ee09c000
> > > [17827.769003] [00001014] *pgd=3eba3831, *pte=00000000, *ppte=00000000
> > > [17827.775383] Internal error: Oops: 17 [#1] SMP ARM
> > > [17827.780108] Modules linked in: usbhid btusb btrtl btbcm btintel bluetooth flexcan smsc95xx usbnet mii ptxc(O)
> > > [17827.790242] CPU: 1 PID: 372 Comm: stress-ng-socke Tainted: G O 4.5.4 #1
> > > [17827.797995] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > [17827.804536] task: ed614780 ti: eebba000 task.ti: eebba000
> > > [17827.809977] PC is at __netif_receive_skb_core+0x328/0xa9c
> >
> > Unfortunately in the middle of a rather long function, and I don't
> > see a spin_unlock in this function, in fact it's not even called
> > with a spinlock held, so it must be something more indirect.
>
> On a kernel here, I have:
>
> 1290: e51b4058 ldr r4, [fp, #-88] ; 0xffffffa8
> ...
> 12b0: e5943014 ldr r3, [r4, #20]
> 12b4: e5b37054 ldr r7, [r3, #84]! ; 0x54
> 12b8: e1570003 cmp r7, r3
> 12bc: e2477014 sub r7, r7, #20
> 12c0: 0a00001f beq 1344 <__netif_receive_skb_core+0x3c0>
> ...
> 1314: e1a0300a mov r3, sl
> 1318: e12fff3c blx ip
> 131c: e51b4058 ldr r4, [fp, #-88] ; 0xffffffa8
> 1320: e1a02007 mov r2, r7
> 1324: e5971014 ldr r1, [r7, #20]
>
> So it's a list of some sort. fp, #-88 is the first arg, so that's
> the struct sk_buff pointer.
>
> Adding debug info to the build, reveals that it's this:
>
> list_for_each_entry_rcu(ptype, &skb->dev->ptype_all, list) {
> if (pt_prev)
> ret = deliver_skb(skb, pt_prev, orig_dev);
> pt_prev = ptype;
> }
>
> specifically, the load is for __read_once_size() inside
> list_for_each_entry_rcu().
Ok, so this is an rcu protected list that gets written to using the function
void dev_add_pack(struct packet_type *pt)
{
struct list_head *head = ptype_head(pt);
spin_lock(&ptype_lock);
list_add_rcu(&pt->list, head);
spin_unlock(&ptype_lock);
}
EXPORT_SYMBOL(dev_add_pack);
and the respective __dev_remove_pack taking the same lock. These get called
once for each network protocol (which basically should never change) and
also for af_packet.c when registering a new listener.
Somehow we managed to get an invalid entry in the list, which could
be related to lots of af_packet registering/unregistering.
Does the stress-ng test case do that?
Do the other oops output logs have any relation to the above?
Arnd
More information about the linux-arm-kernel
mailing list