lockdep splat when disconnecting NVMe oF TCP inititator

Sagi Grimberg sagi at grimberg.me
Tue Sep 29 04:20:00 EDT 2020


> Hi nvme folks,
> 
> I've toyed a bit with NVMe oF TCP over the last few days and came
> across the following lockdep splat on the target when disconnecting the
> initiator. I don't know if I'll have time to look into this anytime
> soon, so I figured I could at least leave the report here.

This is the way the driver always behaved... To be honest I'm not clear
at all that this is not a false negative.

Nothing in the release_work is dependent on the flush_scheduled_work
waiting for online queue removals...

I'll try to have a deeper look later.

> 
> Regards,
> Lucas
> 
> [  142.078964] WARNING: possible circular locking dependency detected
> [  142.085172] 5.9.0-rc5 #2 Not tainted
> [  142.091982] ------------------------------------------------------
> [  142.098186] kworker/0:3/74 is trying to acquire lock:
> [  142.103259] ffff000015228830 ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: __flush_work+0x54/0x510
> [  142.112981]
> [  142.112981] but task is already holding lock:
> [  142.118835] ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [  142.129413]
> [  142.129413] which lock already depends on the new lock.
> [  142.129413]
> [  142.137618]
> [  142.137618] the existing dependency chain (in reverse order) is:
> [  142.145124]
> [  142.145124] -> #2 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
> [  142.153352]        process_one_work+0x248/0x710
> [  142.157906]        worker_thread+0x74/0x470
> [  142.162119]        kthread+0x15c/0x160
> [  142.165894]        ret_from_fork+0x10/0x38
> [  142.170005]
> [  142.170005] -> #1 ((wq_completion)events){+.+.}-{0:0}:
> [  142.176665]        flush_workqueue+0x98/0x400
> [  142.181054]        nvmet_tcp_install_queue+0x11c/0x130
> [  142.186217]        nvmet_install_queue+0xbc/0x150
> [  142.190946]        nvmet_execute_admin_connect+0x11c/0x200
> [  142.196458]        nvmet_tcp_io_work+0x8c0/0x950
> [  142.201099]        process_one_work+0x294/0x710
> [  142.205654]        worker_thread+0x74/0x470
> [  142.209860]        kthread+0x15c/0x160
> [  142.213630]        ret_from_fork+0x10/0x38
> [  142.217742]
> [  142.217742] -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
> [  142.225542]        __lock_acquire+0x13fc/0x2160
> [  142.230097]        lock_acquire+0xec/0x4d0
> [  142.234215]        __flush_work+0x7c/0x510
> [  142.238333]        flush_work+0x14/0x20
> [  142.242192]        nvmet_tcp_release_queue_work+0xb0/0x280
> [  142.247703]        process_one_work+0x294/0x710
> [  142.252257]        worker_thread+0x74/0x470
> [  142.256465]        kthread+0x15c/0x160
> [  142.260234]        ret_from_fork+0x10/0x38
> [  142.264346]
> [  142.264346] other info that might help us debug this:
> [  142.264346]
> [  142.272375] Chain exists of:
> [  142.272375]   (work_completion)(&queue->io_work) --> (wq_completion)events --> (work_completion)(&queue->release_work)
> [  142.272375]
> [  142.287476]  Possible unsafe locking scenario:
> [  142.287476]
> [  142.293415]        CPU0                    CPU1
> [  142.297962]        ----                    ----
> [  142.302508]   lock((work_completion)(&queue->release_work));
> [  142.308194]                                lock((wq_completion)events);
> [  142.314836]                                lock((work_completion)(&queue->release_work));
> [  142.323046]   lock((work_completion)(&queue->io_work));
> [  142.328296]
> [  142.328296]  *** DEADLOCK ***
> [  142.328296]
> [  142.334239] 2 locks held by kworker/0:3/74:
> [  142.338438]  #0: ffff000017405738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [  142.347887]  #1: ffff800011efbdc0 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x1ec/0x710
> [  142.358901]
> [  142.358901] stack backtrace:
> [  142.363284] CPU: 0 PID: 74 Comm: kworker/0:3 Not tainted 5.9.0-rc5 #2
> [  142.372970] Hardware name: XXX (DT)
> [  142.378053] Workqueue: events nvmet_tcp_release_queue_work
> [  142.383563] Call trace:
> [  142.386032]  dump_backtrace+0x0/0x1b0
> [  142.389717]  show_stack+0x18/0x30
> [  142.393055]  dump_stack+0xe8/0x15c
> [  142.396480]  print_circular_bug+0x278/0x280
> [  142.400688]  check_noncircular+0x164/0x1e0
> [  142.404808]  __lock_acquire+0x13fc/0x2160
> [  142.408841]  lock_acquire+0xec/0x4d0
> [  142.412437]  __flush_work+0x7c/0x510
> [  142.416032]  flush_work+0x14/0x20
> [  142.419369]  nvmet_tcp_release_queue_work+0xb0/0x280
> [  142.424358]  process_one_work+0x294/0x710
> [  142.428390]  worker_thread+0x74/0x470
> [  142.432075]  kthread+0x15c/0x160
> [  142.435322]  ret_from_fork+0x10/0x38
> 



More information about the Linux-nvme mailing list