[PATCH v3] nvme: core: shorten duration of multipath namespace rescan

Mon Aug 26 09:39:51 PDT 2024

For multipath devices, nvme_update_ns_info() needs to freeze both
the queue of the path and the queue of the multipath device. For
both operations, it waits for one RCU grace period to pass, ~25ms
on my test system. By calling blk_freeze_queue_start() for the
multipath queue early, we avoid waiting twice; tests using ftrace
have shown that the second blk_mq_freeze_queue_wait() call finishes
in just a few microseconds. The path queue is unfrozen before
calling blk_mq_freeze_queue_wait() on the multipath queue, so that
possibly outstanding IO in the multipath queue can be flushed.

I tested this using the "controller rescan under I/O load" test
I submitted recently [1].

[1] https://lore.kernel.org/linux-nvme/20240822193814.106111-3-mwilck@suse.com/T/#u

Signed-off-by: Martin Wilck <mwilck at suse.com>
---
v3:
 - added an out label and reversed the ret logic (Sagi Grimberg)

v2: (all changes suggested by Sagi Grimberg)
 - patch subject changed from "nvme: core: freeze multipath queue early in
   nvme_update_ns_info()" to "nvme: core: shorten duration of multipath
   namespace rescan"
 - inserted comment explaining why blk_freeze_queue_start() is called early
 - wait for queue to be frozen even if ret != 0
 - make code structure more obvious vs. freeze_start / freeze_wait / unfreeze

Hannes and Daniel had already added Reviewed-by: tags to the v1 patch, but
I didn't add them above, because the patch looks quite different now.
---
 drivers/nvme/host/core.c | 88 ++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 36 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0dc8bcc664f2..13164ca866ea 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2215,8 +2215,20 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
 	bool unsupported = false;
+	struct queue_limits *ns_lim;
+	struct queue_limits lim;
 	int ret;
 
+	/*
+	 * The controller queue is going to be frozen in
+	 * nvme_update_ns_info_{generic,block}(). Every freeze implies waiting
+	 * for an RCU grace period to pass. For multipath devices, we
+	 * need to freeze the multipath queue, too. Start freezing the
+	 * multipath queue now, lest we need to wait for two grace periods.
+	 */
+	if (nvme_ns_head_multipath(ns->head))
+		blk_freeze_queue_start(ns->head->disk->queue);
+
 	switch (info->ids.csi) {
 	case NVME_CSI_ZNS:
 		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
@@ -2250,45 +2262,49 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		ret = 0;
 	}
 
-	if (!ret && nvme_ns_head_multipath(ns->head)) {
-		struct queue_limits *ns_lim = &ns->disk->queue->limits;
-		struct queue_limits lim;
+	if (!nvme_ns_head_multipath(ns->head))
+		return ret;
 
-		blk_mq_freeze_queue(ns->head->disk->queue);
-		/*
-		 * queue_limits mixes values that are the hardware limitations
-		 * for bio splitting with what is the device configuration.
-		 *
-		 * For NVMe the device configuration can change after e.g. a
-		 * Format command, and we really want to pick up the new format
-		 * value here.  But we must still stack the queue limits to the
-		 * least common denominator for multipathing to split the bios
-		 * properly.
-		 *
-		 * To work around this, we explicitly set the device
-		 * configuration to those that we just queried, but only stack
-		 * the splitting limits in to make sure we still obey possibly
-		 * lower limitations of other controllers.
-		 */
-		lim = queue_limits_start_update(ns->head->disk->queue);
-		lim.logical_block_size = ns_lim->logical_block_size;
-		lim.physical_block_size = ns_lim->physical_block_size;
-		lim.io_min = ns_lim->io_min;
-		lim.io_opt = ns_lim->io_opt;
-		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
-					ns->head->disk->disk_name);
-		if (unsupported)
-			ns->head->disk->flags |= GENHD_FL_HIDDEN;
-		else
-			nvme_init_integrity(ns->head, &lim, info);
-		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+	blk_mq_freeze_queue_wait(ns->head->disk->queue);
+	if (ret)
+		goto out;
 
-		set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
-		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
-		nvme_mpath_revalidate_paths(ns);
+	/*
+	 * queue_limits mixes values that are the hardware limitations
+	 * for bio splitting with what is the device configuration.
+	 *
+	 * For NVMe the device configuration can change after e.g. a
+	 * Format command, and we really want to pick up the new format
+	 * value here.  But we must still stack the queue limits to the
+	 * least common denominator for multipathing to split the bios
+	 * properly.
+	 *
+	 * To work around this, we explicitly set the device
+	 * configuration to those that we just queried, but only stack
+	 * the splitting limits in to make sure we still obey possibly
+	 * lower limitations of other controllers.
+	 */
 
-		blk_mq_unfreeze_queue(ns->head->disk->queue);
-	}
+	ns_lim = &ns->disk->queue->limits;
+	lim = queue_limits_start_update(ns->head->disk->queue);
+	lim.logical_block_size = ns_lim->logical_block_size;
+	lim.physical_block_size = ns_lim->physical_block_size;
+	lim.io_min = ns_lim->io_min;
+	lim.io_opt = ns_lim->io_opt;
+	queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
+				ns->head->disk->disk_name);
+	if (unsupported)
+		ns->head->disk->flags |= GENHD_FL_HIDDEN;
+	else
+		nvme_init_integrity(ns->head, &lim, info);
+	ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+
+	set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
+	set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
+	nvme_mpath_revalidate_paths(ns);
+
+out:
+	blk_mq_unfreeze_queue(ns->head->disk->queue);
 
 	return ret;
 }
-- 
2.46.0