[PATCH 2/2] nvme: rdma/tcp: call nvme_delete_dead_ctrl for handling reconnect failure
Ming Lei
ming.lei at redhat.com
Tue May 30 02:43:22 PDT 2023
Reconnect failure has been reached after trying enough times, and controller
is actually incapable of handling IO, so it should be marked as dead, so call
nvme_delete_dead_ctrl() to handle the failure for avoiding the following IO
deadlock:
1) writeback IO waits in __bio_queue_enter() because queue is frozen
during error recovery
2) reconnect failure handler removes controller, and del_gendisk() waits
for above writeback IO in fsync/invalidate bdev
Fix the issue by calling nvme_delete_dead_ctrl() which call
nvme_mark_namespaces_dead() before deleting disk, so the above writeback
IO will be failed, and IO deadlock is avoided.
Reported-by: Yi Zhang <yi.zhang at redhat.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0eb79696fb73..cdf5855c3009 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1028,7 +1028,7 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
ctrl->ctrl.opts->reconnect_delay * HZ);
} else {
- nvme_delete_ctrl(&ctrl->ctrl);
+ nvme_delete_dead_ctrl(&ctrl->ctrl);
}
}
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..2c119bff7010 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2047,7 +2047,7 @@ static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
ctrl->opts->reconnect_delay * HZ);
} else {
dev_info(ctrl->device, "Removing controller...\n");
- nvme_delete_ctrl(ctrl);
+ nvme_delete_dead_ctrl(ctrl);
}
}
--
2.40.1
More information about the Linux-nvme
mailing list