[PATCH 2/3] nvme-multipath: cannot disconnect controller on stuck partition scan

Hannes Reinecke hare at kernel.org
Mon Oct 7 03:01:33 PDT 2024


When a namespace state is changed during partition scan triggered via
nvme_scan_ns->nvme_mpath_set_live()->device_add_disk()
I/O might be returned with a path error, causing it to be retried on
other paths. But if this happens to be the last path the process will
be stuck.
Trying to disconnect this controller will call

nvme_unquiesce_io_queues()
flush_work(&ctrl_scan_work)

where the first should abort/retry all I/O pending during
scan such that the following 'flush_work' can succeeed.
However, we explicitly do _not_ ignore paths from deleted controllers
in nvme_mpath_is_disabled(), so that I/O on these devices
will be _retried_, not aborted, and the scanning process
continues to be stuck. So the process to disconnect the
controller will be stuck in flush_work(), and that controller
and all namespaces become unusable until the system is rebooted.

Fixes: ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")

Signed-off-by: Hannes Reinecke <hare at kernel.org>
---
 drivers/nvme/host/multipath.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 61f8ae199288..f03ef983a75f 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -239,6 +239,13 @@ static bool nvme_path_is_disabled(struct nvme_ns *ns)
 {
 	enum nvme_ctrl_state state = nvme_ctrl_state(ns->ctrl);
 
+	/*
+	 * Skip deleted controllers for I/O from partition scan
+	 */
+	if (state == NVME_CTRL_DELETING &&
+	    mutex_is_locked(&ns->ctrl->scan_lock))
+		return true;
+
 	/*
 	 * We don't treat NVME_CTRL_DELETING as a disabled path as I/O should
 	 * still be able to complete assuming that the controller is connected.
-- 
2.35.3




More information about the Linux-nvme mailing list