[LEDE-DEV] [BUG] [lantiq] xrx200 network driver sleeping with spinlock held
Andrea Merello
andrea.merello at gmail.com
Mon Jul 3 09:52:48 PDT 2017
Due to some OT reasons I'm compiling LEDE kernel (4.9.31) with several
debug checks enabled, and I'm using it on a Lantiq xrx200 board
(fritzbox 3370).
I've hit a bug (among another one) in lantiq_xrx200.c network driver:
in the xrx200_close() function we call napi_disable(), that could
sleep, with priv->hw->chan[i].lock held, causing the kernel to
complain [1].
After a quick look at the code I couldn't convince myself about why we
need to protect that specific code part with the lock. IMHO there
seems no reason to protect the refcount variables, because AFAIK
ndo_close() and ndo_open() callbacks are already called with a
semaphore held. Neither I could figure out why napi_disable() have to
be called with that lock held. I don't know about ltq_dma_close(), and
whether your intention was to avoid races wrt housekeeping tasklet,
but my speculation is that eventually the lock could be reduced to
just that function [0].
However, because I'm not familiar with this driver code, I really just
wanted to point out the problem here, and hear from you :)
Andrea
[0]
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -898,14 +898,15 @@ static int xrx200_close(struct net_device *dev)
for (i = 0; i < XRX200_MAX_DMA; i++) {
if (!priv->hw->chan[i].dma.irq)
continue;
- spin_lock_bh(&priv->hw->chan[i].lock);
+
priv->hw->chan[i].refcount--;
if (!priv->hw->chan[i].refcount) {
if (XRX200_DMA_IS_RX(i))
napi_disable(&priv->hw->chan[i].napi);
+ spin_lock_bh(&priv->hw->chan[i].lock);
ltq_dma_close(&priv->hw->chan[XRX200_DMA_RX].dma);
+ spin_unlock_bh(&priv->hw->chan[i].lock);
}
- spin_unlock_bh(&priv->hw->chan[i].lock);
}
return 0;
[1]
[ 7.327061] BUG: sleeping function called from invalid context at
net/core/dev.c:5144
[ 7.333555] in_atomic(): 1, irqs_disabled(): 0, pid: 534, name: ip
[ 7.339698] 2 locks held by ip/534:
[ 7.343111] #0: (rtnl_mutex){......}, at: [<8049a780>]
devinet_ioctl+0x180/0x724
[ 7.350744] #1: (&(&ch->lock)->rlock){......}, at: [<803b303c>]
xrx200_close+0xc4/0x150
[ 7.358942] CPU: 1 PID: 534 Comm: ip Not tainted 4.9.31 #0
[ 7.364351] Stack : 00000000 00000000 80e6c56a 0000002e 80573bb4
00000000 00000000 807a0000
[ 7.372697] 87d3716c 80795887 806f38bc 00000001 00000216
809041fc 00008914 7ff828fc
[ 7.381053] 7ff828bc 8007da38 00008914 7ff828fc 7ff828bc
8007da38 806faf04 8749dc6c
[ 7.389409] 00000001 800be57c 806f97e4 8749dc7c 00008914
80790000 7ff828bc 8749dc00
[ 7.397765] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 7.406121] ...
[ 7.408560] Call Trace:
[ 7.411012] [<80010b38>] show_stack+0x50/0x84
[ 7.415397] [<802c0374>] dump_stack+0xd4/0x110
[ 7.419825] [<800569b0>] ___might_sleep+0xfc/0x11c
[ 7.424613] [<8041cf98>] napi_disable+0x3c/0x194
[ 7.429223] [<803b3078>] xrx200_close+0x100/0x150
[ 7.433917] [<80419f9c>] __dev_close_many+0xf4/0x128
[ 7.438885] [<8041a148>] __dev_close+0x30/0x50
[ 7.443315] [<80423c40>] __dev_change_flags+0xb8/0x174
[ 7.448456] [<80423d24>] dev_change_flags+0x28/0x70
[ 7.453339] [<8049a838>] devinet_ioctl+0x238/0x724
[ 7.458128] [<80400694>] sock_ioctl+0x2cc/0x32c
[ 7.462649] [<8011e840>] vfs_ioctl+0x20/0x40
[ 7.466902] [<8011f1f4>] do_vfs_ioctl+0x7e8/0x904
[ 7.471618] [<8011f360>] SyS_ioctl+0x50/0xa0
[ 7.475876] [<8001ad38>] syscall_common+0x34/0x58
More information about the Lede-dev
mailing list