[LEDE-DEV] [BUG] [lantiq] xrx200 network driver sleeping with spinlock held

Andrea Merello andrea.merello at gmail.com
Mon Jul 3 09:52:48 PDT 2017


Due to some OT reasons I'm compiling LEDE kernel (4.9.31) with several
debug checks enabled, and I'm using it on a Lantiq xrx200 board
(fritzbox 3370).

I've hit a bug (among another one) in lantiq_xrx200.c network driver:
in the xrx200_close() function we call napi_disable(), that could
sleep, with priv->hw->chan[i].lock held, causing the kernel to
complain [1].

After a quick look at the code I couldn't convince myself about why we
need to protect that specific code part with the lock. IMHO there
seems no reason to protect the refcount variables, because AFAIK
ndo_close() and ndo_open() callbacks are already called with a
semaphore held. Neither I could figure out why napi_disable() have to
be called with that lock held. I don't know about ltq_dma_close(), and
whether your intention was to avoid races wrt housekeeping tasklet,
but my speculation is that eventually the lock could be reduced to
just that function [0].

However, because I'm not familiar with this driver code, I really just
wanted to point out the problem here, and hear from you :)

Andrea

[0]
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -898,14 +898,15 @@ static int xrx200_close(struct net_device *dev)
        for (i = 0; i < XRX200_MAX_DMA; i++) {
                if (!priv->hw->chan[i].dma.irq)
                        continue;
-               spin_lock_bh(&priv->hw->chan[i].lock);
+
                priv->hw->chan[i].refcount--;
                if (!priv->hw->chan[i].refcount) {
                        if (XRX200_DMA_IS_RX(i))
                                napi_disable(&priv->hw->chan[i].napi);
+                       spin_lock_bh(&priv->hw->chan[i].lock);
                        ltq_dma_close(&priv->hw->chan[XRX200_DMA_RX].dma);
+                       spin_unlock_bh(&priv->hw->chan[i].lock);
                }
-               spin_unlock_bh(&priv->hw->chan[i].lock);
        }

        return 0;

[1]
[    7.327061] BUG: sleeping function called from invalid context at
net/core/dev.c:5144
[    7.333555] in_atomic(): 1, irqs_disabled(): 0, pid: 534, name: ip
[    7.339698] 2 locks held by ip/534:
[    7.343111]  #0:  (rtnl_mutex){......}, at: [<8049a780>]
devinet_ioctl+0x180/0x724
[    7.350744]  #1:  (&(&ch->lock)->rlock){......}, at: [<803b303c>]
xrx200_close+0xc4/0x150
[    7.358942] CPU: 1 PID: 534 Comm: ip Not tainted 4.9.31 #0
[    7.364351] Stack : 00000000 00000000 80e6c56a 0000002e 80573bb4
00000000 00000000 807a0000
[    7.372697]         87d3716c 80795887 806f38bc 00000001 00000216
809041fc 00008914 7ff828fc
[    7.381053]         7ff828bc 8007da38 00008914 7ff828fc 7ff828bc
8007da38 806faf04 8749dc6c
[    7.389409]         00000001 800be57c 806f97e4 8749dc7c 00008914
80790000 7ff828bc 8749dc00
[    7.397765]         00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[    7.406121]         ...
[    7.408560] Call Trace:
[    7.411012] [<80010b38>] show_stack+0x50/0x84
[    7.415397] [<802c0374>] dump_stack+0xd4/0x110
[    7.419825] [<800569b0>] ___might_sleep+0xfc/0x11c
[    7.424613] [<8041cf98>] napi_disable+0x3c/0x194
[    7.429223] [<803b3078>] xrx200_close+0x100/0x150
[    7.433917] [<80419f9c>] __dev_close_many+0xf4/0x128
[    7.438885] [<8041a148>] __dev_close+0x30/0x50
[    7.443315] [<80423c40>] __dev_change_flags+0xb8/0x174
[    7.448456] [<80423d24>] dev_change_flags+0x28/0x70
[    7.453339] [<8049a838>] devinet_ioctl+0x238/0x724
[    7.458128] [<80400694>] sock_ioctl+0x2cc/0x32c
[    7.462649] [<8011e840>] vfs_ioctl+0x20/0x40
[    7.466902] [<8011f1f4>] do_vfs_ioctl+0x7e8/0x904
[    7.471618] [<8011f360>] SyS_ioctl+0x50/0xa0
[    7.475876] [<8001ad38>] syscall_common+0x34/0x58



More information about the Lede-dev mailing list