[PATCH 3/3] nvme-rdma: fix possible hang when trying to set a live path during I/O
Sagi Grimberg
sagi at grimberg.me
Mon Mar 15 22:27:14 GMT 2021
When we teardown a controller we first freeze the queue to prevent
request submissions, and quiesce the queue to prevent request queueing
and we only unfreeze/unquiesce when we successfully reconnect a
controller.
In case we attempt to set a live path (optimized/non-optimized) and
update the current_path reference, we first need to wait for any
ongoing dispatches (synchronize the head->srcu).
However bio submissions _can_ block as the underlying controller queue
is frozen. which creates the below deadlock [1]. So it is clear that
the namespaces request queue must be unfrozen and unquiesced asap when
we teardown the controller.
However, when we are not in a multipath environment (!multipath or cmic
indicates ns isn't shared) we don't want to fail-fast the I/O, hence we
must keep the namespaces request queue frozen and quiesced and only
expire them when the controller successfully reconnects (and FAILFAST
may fail the I/O sooner).
[1] (happened in nvme-tcp, but works the same in nvme-rdma):
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
__schedule+0x293/0x730
schedule+0x33/0xa0
schedule_timeout+0x1d3/0x2f0
wait_for_completion+0xba/0x140
__synchronize_srcu.part.21+0x91/0xc0
synchronize_srcu_expedited+0x27/0x30
synchronize_srcu+0xce/0xe0
nvme_mpath_set_live+0x64/0x130 [nvme_core]
nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
nvme_update_ana_state+0xcd/0xe0 [nvme_core]
nvme_parse_ana_log+0xa1/0x180 [nvme_core]
nvme_read_ana_log+0x76/0x100 [nvme_core]
nvme_mpath_init+0x122/0x180 [nvme_core]
nvme_init_identify+0x80e/0xe20 [nvme_core]
nvme_tcp_setup_ctrl+0x359/0x660 [nvme_tcp]
nvme_tcp_reconnect_ctrl_work+0x24/0x70 [nvme_tcp]
Fix this by looking into the newly introduced nvme_ctrl_is_mpath and
unquiesce/unfreeze the namespaces request queues accordingly (in
the teardown for mpath and after a successful reconnect for non-mpath).
Also, we no longer need the explicit nvme_start_queues in the error
recovery work.
Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
drivers/nvme/host/rdma.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index be905d4fdb47..43e8608ad5b7 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -989,19 +989,22 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
goto out_cleanup_connect_q;
if (!new) {
- nvme_start_queues(&ctrl->ctrl);
- if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
- /*
- * If we timed out waiting for freeze we are likely to
- * be stuck. Fail the controller initialization just
- * to be safe.
- */
- ret = -ENODEV;
- goto out_wait_freeze_timed_out;
+ if (!nvme_ctrl_is_mpath(&ctrl->ctrl)) {
+ nvme_start_queues(&ctrl->ctrl);
+ if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+ /*
+ * If we timed out waiting for freeze we are likely to
+ * be stuck. Fail the controller initialization just
+ * to be safe.
+ */
+ ret = -ENODEV;
+ goto out_wait_freeze_timed_out;
+ }
}
blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
ctrl->ctrl.queue_count - 1);
- nvme_unfreeze(&ctrl->ctrl);
+ if (!nvme_ctrl_is_mpath(&ctrl->ctrl))
+ nvme_unfreeze(&ctrl->ctrl);
}
return 0;
@@ -1043,8 +1046,11 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
nvme_sync_io_queues(&ctrl->ctrl);
nvme_rdma_stop_io_queues(ctrl);
nvme_cancel_tagset(&ctrl->ctrl);
- if (remove)
+ if (nvme_ctrl_is_mpath(&ctrl->ctrl)) {
nvme_start_queues(&ctrl->ctrl);
+ nvme_wait_freeze(&ctrl->ctrl);
+ nvme_unfreeze(&ctrl->ctrl);
+ }
nvme_rdma_destroy_io_queues(ctrl, remove);
}
}
@@ -1191,7 +1197,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
nvme_stop_keep_alive(&ctrl->ctrl);
nvme_rdma_teardown_io_queues(ctrl, false);
- nvme_start_queues(&ctrl->ctrl);
nvme_rdma_teardown_admin_queue(ctrl, false);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
--
2.27.0
More information about the Linux-nvme
mailing list