[PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl
Steve Wise
swise at opengridcomputing.com
Thu Jul 14 07:59:05 PDT 2016
> This patch introduces asymmetry between create and destroy
> of the admin queue. Does this alternative patch solve
> the problem?
>
> The patch changes the order of device removal flow from:
> 1. delete controller
> 2. destroy queue
>
> to:
> 1. destroy queue
> 2. delete controller
>
> Or more specifically:
> 1. own the controller deletion (make sure we are not
> competing with anyone)
> 2. get rid of inflight reconnects (which also destroy and
> create queues)
> 3. destroy the queue
> 4. safely queue controller deletion
>
> Thoughts?
>
Your patch causes a deadlock during device removal.
The controller delete work thread is stuck in c4iw_destroy_qp waiting on all
references to go away. Either nvmf/rdma or the rdma-cm or both.
(gdb) list *c4iw_destroy_qp+0x155
0x15af5 is in c4iw_destroy_qp (drivers/infiniband/hw/cxgb4/qp.c:1596).
1591 c4iw_modify_qp(rhp, qhp, C4IW_QP_ATTR_NEXT_STATE,
&attrs, 0);
1592 wait_event(qhp->wait, !qhp->ep);
1593
1594 remove_handle(rhp, &rhp->qpidr, qhp->wq.sq.qid);
1595 atomic_dec(&qhp->refcnt);
1596 wait_event(qhp->wait, !atomic_read(&qhp->refcnt));
1597
1598 spin_lock_irq(&rhp->lock);
1599 if (!list_empty(&qhp->db_fc_entry))
1600 list_del_init(&qhp->db_fc_entry);
The rdma-cm work thread is stuck trying to grab the cm_id mutex:
(gdb) list *cma_disable_callback+0x2e
0x29e is in cma_disable_callback (drivers/infiniband/core/cma.c:715).
710
711 static int cma_disable_callback(struct rdma_id_private *id_priv,
712 enum rdma_cm_state state)
713 {
714 mutex_lock(&id_priv->handler_mutex);
715 if (id_priv->state != state) {
716 mutex_unlock(&id_priv->handler_mutex);
717 return -EINVAL;
718 }
719 return 0;
And the nvmf cm event handler is stuck waiting for the controller delete to
finish:
(gdb) list *nvme_rdma_device_unplug+0x97
0x1027 is in nvme_rdma_device_unplug (drivers/nvme/host/rdma.c:1358).
warning: Source file is more recent than executable.
1353 queue_delete:
1354 /* queue controller deletion */
1355 queue_work(nvme_rdma_wq, &ctrl->delete_work);
1356 flush_work(&ctrl->delete_work);
1357 return ret;
1358 }
1359
1360 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
1361 struct rdma_cm_event *ev)
1362 {
Seems like the rdma-cm work thread is trying to grab the cm_id lock for the
cm_id that is handling the DEVICE_REMOVAL event.
Here is the detailed log:
[ 1484.568354] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery",
addr 10.0.1.14:4420
[ 1495.939850] nvme nvme1: creating 32 I/O queues.
[ 1496.635114] nvme nvme1: new ctrl: NQN "test-ram0", addr 10.0.1.14:4420
[ 1502.813698] nvme nvme1: Got rdma device removal event, deleting ctrl
[ 1683.259465] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[ 1683.266147] Tainted: G E 4.7.0-rc2-block-for-next+ #78
[ 1683.272986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.280830] kworker/0:0 D ffff8810281579c8 0 4 2 0x10000000
[ 1683.287997] Workqueue: nvme_rdma_wq nvme_rdma_del_ctrl_work [nvme_rdma]
[ 1683.294658] ffff8810281579c8 ffffffff81c0d4c0 ffff88102809c6c0
ffff880fff04e898
[ 1683.302252] ffffffff81d448b4 ffff88102809d840 ffff881028154008
ffff88007853b3f8
[ 1683.309763] ffff881028157ac8 ffff880f67ea5758 ffff880f67ea5760
ffff881028157a18
[ 1683.317281] Call Trace:
[ 1683.319752] [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.324757] [<ffffffffa036bac5>] c4iw_destroy_qp+0x155/0x2b0 [iw_cxgb4]
[ 1683.331481] [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.336639] [<ffffffff810f515a>] ? lock_timer_base+0x5a/0x80
[ 1683.342402] [<ffffffff816ede49>] ? schedule_timeout+0x1c9/0x220
[ 1683.348438] [<ffffffff810ccd80>] ? wait_woken+0x80/0x80
[ 1683.353804] [<ffffffffa03a25a9>] ? ib_mr_pool_destroy+0x79/0x90 [ib_core]
[ 1683.360699] [<ffffffffa039893e>] ib_destroy_qp+0xbe/0x160 [ib_core]
[ 1683.367085] [<ffffffffa0706011>] rdma_destroy_qp+0x31/0x50 [rdma_cm]
[ 1683.373538] [<ffffffffa038ed99>] nvme_rdma_destroy_queue_ib+0x29/0x90
[nvme_rdma]
[ 1683.381127] [<ffffffffa038f04d>] nvme_rdma_stop_and_free_queue+0x2d/0x40
[nvme_rdma]
[ 1683.388967] [<ffffffffa038f157>] nvme_rdma_shutdown_ctrl+0x87/0xe0
[nvme_rdma]
[ 1683.396290] [<ffffffffa038f224>] nvme_rdma_del_ctrl_work+0x34/0x50
[nvme_rdma]
[ 1683.403629] [<ffffffff810a4d03>] process_one_work+0x183/0x4d0
[ 1683.409485] [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.414984] [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.420142] [<ffffffff810c4130>] ? dequeue_entity+0x120/0x250
[ 1683.426009] [<ffffffff810a59bd>] worker_thread+0x16d/0x530
[ 1683.431597] [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.437176] [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.442674] [<ffffffff810ccbc6>] ? __wake_up_common+0x56/0x90
[ 1683.448511] [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.454803] [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.459951] [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.466223] [<ffffffff810aa5dc>] kthread+0xcc/0xf0
[ 1683.471106] [<ffffffff810b4ffe>] ? schedule_tail+0x1e/0xc0
[ 1683.476699] [<ffffffff810ae3b0>] ? smpboot_create_threads+0x80/0x80
[ 1683.483053] [<ffffffff816eed3f>] ret_from_fork+0x1f/0x40
[ 1683.488470] [<ffffffff810aa510>] ? kthread_freezable_should_stop+0x70/0x70
[ 1683.495476] INFO: task kworker/u64:5:716 blocked for more than 120 seconds.
[ 1683.502441] Tainted: G E 4.7.0-rc2-block-for-next+ #78
[ 1683.509237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.517096] kworker/u64:5 D ffff88101d1ff858 0 716 2 0x10000000
[ 1683.524223] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[ 1683.529467] ffff88101d1ff858 ffff88102819c6c0 ffff88101d200040
ffff880fff04e898
[ 1683.536954] ffff88101d200040 ffff88101d200040 ffff88101d1fc008
ffff880f67a0720c
[ 1683.544445] ffff88101d200040 ffff880f67a07210 ffff88101d1ff918
ffff88101d1ff8a8
[ 1683.551932] Call Trace:
[ 1683.554390] [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.559358] [<ffffffff811119ae>] ? is_module_text_address+0xe/0x20
[ 1683.565630] [<ffffffff816eb12e>] schedule_preempt_disabled+0xe/0x10
[ 1683.571990] [<ffffffff816ecbe3>] __mutex_lock_slowpath+0x93/0x120
[ 1683.578200] [<ffffffff813789bb>] ? find_next_bit+0xb/0x10
[ 1683.583694] [<ffffffff81362816>] ? cpumask_next_and+0x26/0x50
[ 1683.589529] [<ffffffff810c4baa>] ? find_busiest_group+0x14a/0xa00
[ 1683.595713] [<ffffffff816ecc93>] mutex_lock+0x23/0x40
[ 1683.600871] [<ffffffffa070326e>] cma_disable_callback+0x2e/0x60 [rdma_cm]
[ 1683.607760] [<ffffffffa0707b5b>] cma_iw_handler+0x2b/0x1b0 [rdma_cm]
[ 1683.614207] [<ffffffffa06f61b3>] cm_close_handler+0x93/0xc0 [iw_cm]
[ 1683.620561] [<ffffffffa06f80e7>] process_event+0xd7/0xf0 [iw_cm]
[ 1683.626665] [<ffffffffa06f8247>] cm_work_handler+0x147/0x1c0 [iw_cm]
[ 1683.633121] [<ffffffff810a54f6>] ?
trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[ 1683.641487] [<ffffffff810a4d03>] process_one_work+0x183/0x4d0
[ 1683.647334] [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.652831] [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.657977] [<ffffffff816eae91>] ? schedule+0x1/0xb0
[ 1683.663038] [<ffffffff810a59bd>] worker_thread+0x16d/0x530
[ 1683.668612] [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.674197] [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.679690] [<ffffffff810ccbc6>] ? __wake_up_common+0x56/0x90
[ 1683.685534] [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.691821] [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.696968] [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.703230] [<ffffffff810aa5dc>] kthread+0xcc/0xf0
[ 1683.708126] [<ffffffff810b4ffe>] ? schedule_tail+0x1e/0xc0
[ 1683.713708] [<ffffffff816eed3f>] ret_from_fork+0x1f/0x40
[ 1683.719118] [<ffffffff810aa510>] ? kthread_freezable_should_stop+0x70/0x70
[ 1683.726126] INFO: task rmmod:6006 blocked for more than 120 seconds.
[ 1683.732483] Tainted: G E 4.7.0-rc2-block-for-next+ #78
[ 1683.739278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.747111] rmmod D ffff880f647479f8 0 6006 4227 0x10000080
[ 1683.754210] ffff880f647479f8 ffff88102819af40 ffff880f647017c0
ffff880fff04e898
[ 1683.761729] ffffffff8116c656 0000000000000001 ffff880f64744008
ffff880f64747b98
[ 1683.769244] 7fffffffffffffff ffff880f647017c0 7fffffffffffffff
ffff880f64747a48
[ 1683.776768] Call Trace:
[ 1683.779229] [<ffffffff8116c656>] ? trace_event_buffer_commit+0x146/0x1d0
[ 1683.786037] [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.791009] [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.796585] [<ffffffff816ede49>] schedule_timeout+0x1c9/0x220
[ 1683.802430] [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.807922] [<ffffffff810a3603>] ? insert_work+0x53/0xb0
[ 1683.813341] [<ffffffff816ebede>] wait_for_completion+0xde/0x110
[ 1683.819386] [<ffffffff810b6b00>] ? try_to_wake_up+0x230/0x230
[ 1683.825236] [<ffffffff810a472b>] flush_work+0x2b/0x40
[ 1683.830379] [<ffffffff810a1810>] ? worker_detach_from_pool+0xa0/0xa0
[ 1683.836830] [<ffffffffa038eff7>] nvme_rdma_device_unplug+0x97/0xc0
[nvme_rdma]
[ 1683.844142] [<ffffffffa0390782>] nvme_rdma_cm_handler+0x72/0x2f0 [nvme_rdma]
[ 1683.851297] [<ffffffff812085c4>] ? kmem_cache_free+0x1f4/0x210
[ 1683.857230] [<ffffffffa0703669>] ? cma_comp+0x49/0x60 [rdma_cm]
[ 1683.863247] [<ffffffffa070810f>] cma_remove_id_dev+0x8f/0xa0 [rdma_cm]
[ 1683.869865] [<ffffffffa07081d7>] cma_process_remove+0xb7/0x100 [rdma_cm]
[ 1683.876681] [<ffffffff812b2a14>] ? __kernfs_remove+0x114/0x1d0
[ 1683.882608] [<ffffffffa070825e>] cma_remove_one+0x3e/0x60 [rdma_cm]
[ 1683.888983] [<ffffffffa039dca0>] ib_unregister_device+0xb0/0x150 [ib_core]
[ 1683.895957] [<ffffffffa0363034>] c4iw_unregister_device+0x64/0x90 [iw_cxgb4]
[ 1683.903109] [<ffffffffa0356357>] c4iw_remove+0x27/0x60 [iw_cxgb4]
[ 1683.909303] [<ffffffffa036e6fd>] c4iw_exit_module+0x31/0x934 [iw_cxgb4]
[ 1683.916018] [<ffffffff811120e3>] SyS_delete_module+0x183/0x1d0
[ 1683.921955] [<ffffffff81003476>] ? do_audit_syscall_entry+0x66/0x70
[ 1683.928314] [<ffffffff8100378c>] ? trace_event_raw_event_sys_exit+0x6c/0xa0
[ 1683.935357] [<ffffffff81003b45>] ? syscall_trace_enter+0x65/0x70
[ 1683.941501] [<ffffffff81003f08>] do_syscall_64+0x78/0x1d0
[ 1683.947003] [<ffffffff8106e367>] ? do_page_fault+0x37/0x90
[ 1683.952587] [<ffffffff816eebe1>] entry_SYSCALL64_slow_path+0x25/0x25
More information about the Linux-nvme
mailing list