[PATCH 4.15] nvme-rdma: fix concurrent reset and reconnect

Sagi Grimberg sagi at grimberg.me
Thu Dec 21 04:54:15 PST 2017


Now ctrl state machine allows to transition from RESETTING
to RECONNECTING. In nvme-rdma when we receive a rdma cm
DISONNECTED event, we trigger nvme_rdma_error_recovery. This
happens also when we execute a controller reset, issue a
cm diconnect request and receive a cm disconnect reply, as
a result, the reset work and the error recovery work can
run concurrently.

Until now the state machine prevented from the error recovery
work from running as a result of a controller reset (RESETTING
-> RECONNECTING was not allowed).

To fix this, we adopt the FC state machine approach, we always
transition from LIVE to RESETTING and only then to RECONNECTING.
We do this both for the error recovery work and the controller
reset work:
1. transition to RESETTING
2. teardown the controller association
3. transition to RECONNECTING

This will restore the protection against reset work and
error recovery work from concurrently running together.

Fixes: 3cec7f9de448 ("nvme: allow controller RESETTING to RECONNECTING transition")
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 596a3dde47fd..edc19b17253d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -974,12 +974,18 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 	nvme_start_queues(&ctrl->ctrl);
 
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
+		/* state change failure should never happen */
+		WARN_ON_ONCE(1);
+		return;
+	}
+
 	nvme_rdma_reconnect_or_remove(ctrl);
 }
 
 static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
 {
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING))
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
 		return;
 
 	queue_work(nvme_wq, &ctrl->err_work);
@@ -1753,6 +1759,12 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	nvme_stop_ctrl(&ctrl->ctrl);
 	nvme_rdma_shutdown_ctrl(ctrl, false);
 
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
+		/* state change failure should never happen */
+		WARN_ON_ONCE(1);
+		return;
+	}
+
 	ret = nvme_rdma_configure_admin_queue(ctrl, false);
 	if (ret)
 		goto out_fail;
-- 
2.14.1




More information about the Linux-nvme mailing list