[PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access

Sagi Grimberg sagi at grimberg.me
Wed Sep 28 06:58:16 PDT 2022


When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
it is possible that during the path revalidation, the only paths that is IO
capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet
informed to the host, which will cause inflight requests to be requeued (as we
have available paths but none are IO capable). These requests on the requeue
list are waiting for someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit the
queued requests.

Fix this by always kicking the requeue list, and if no IO capable path exists,
these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--

Reported-by: Yogev Cohen <yogev at lightbitslabs.com>
Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
I was easily capable to reproduce this regression with a small debug patch to nvmet
to have a 1 second delay between controller AENs:
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
 #include "trace.h"
 
 #include "nvmet.h"
+#include <linux/delay.h>
 
 struct workqueue_struct *buffered_io_wq;
 struct workqueue_struct *zbd_wq;
@@ -248,6 +249,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
                nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
                                NVME_AER_NOTICE_NS_CHANGED,
                                NVME_LOG_CHANGED_NS);
+               msleep(1000);
        }
 }

And expose a subsystem via two ports, one ANA 'optimized', and one ANA 'inaccessible'.

Then test file-backed ns resize duing I/O:
truncate --size=2G /tmp/f && echo 1 > /sys/kernel/config/nvmet/subsystems/testnqn1/namespaces/1/revalidate_size

 drivers/nvme/host/multipath.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6ef497c75a16..d532a78f24a2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
 void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
 {
 	struct nvme_ns_head *head = ns->head;
+	struct nvme_ns *n;
 	sector_t capacity = get_capacity(head->disk);
 	int node;
 
-	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (capacity != get_capacity(ns->disk))
-			clear_bit(NVME_NS_READY, &ns->flags);
+	list_for_each_entry_rcu(n, &head->list, siblings) {
+		if (capacity != get_capacity(n->disk))
+			clear_bit(NVME_NS_READY, &n->flags);
 	}
 
 	for_each_node(node)
 		rcu_assign_pointer(head->current_path[node], NULL);
+	nvme_kick_requeue_lists(ns->ctrl);
 }
 
 static bool nvme_path_is_disabled(struct nvme_ns *ns)
-- 
2.34.1




More information about the Linux-nvme mailing list