[PATCH 2/2] nvme-multipath: fix I/O stall when remapping namespaces

Sagi Grimberg sagi at grimberg.me
Tue Sep 3 12:38:24 PDT 2024




On 03/09/2024 21:03, Hannes Reinecke wrote:
> During repetitive namespace remapping operations (ie removing a namespace and
> provision a different namespace with the same NSID) on the target the
> namespace might have changed between the time the initial scan
> was performed, and partition scan was invoked by device_add_disk()
> in nvme_mpath_set_live(). We then end up with a stuck scanning process:
>
> [<0>] folio_wait_bit_common+0x12a/0x310
> [<0>] filemap_read_folio+0x97/0xd0
> [<0>] do_read_cache_folio+0x108/0x390
> [<0>] read_part_sector+0x31/0xa0
> [<0>] read_lba+0xc5/0x160
> [<0>] efi_partition+0xd9/0x8f0
> [<0>] bdev_disk_changed+0x23d/0x6d0
> [<0>] blkdev_get_whole+0x78/0xc0
> [<0>] bdev_open+0x2c6/0x3b0
> [<0>] bdev_file_open_by_dev+0xcb/0x120
> [<0>] disk_scan_partitions+0x5d/0x100
> [<0>] device_add_disk+0x402/0x420
> [<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core]
> [<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core]
> [<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core]
> [<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core]
> [<0>] nvme_scan_work+0x1a3/0x490 [nvme_core]
>
> This happens when we have several paths, some of which are inaccessible,
> and the active paths are removed first. Then nvme_find_path() will requeue
> I/O in the ns_head (as paths are present), but the requeue list is never
> triggered as all remaining paths are inactive.
> This patch checks for NVME_NSHEAD_DISK_LIVE when selecting a path,
> and requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared once
> the last path has been removed to properly terminate pending I/O.
>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
>   drivers/nvme/host/multipath.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index c9d23b1b8efc..1b1deb0450ab 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -407,6 +407,9 @@ static struct nvme_ns *nvme_numa_path(struct nvme_ns_head *head)
>   
>   inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
>   {
> +	if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
> +		return NULL;
> +
>   	switch (READ_ONCE(head->subsys->iopolicy)) {
>   	case NVME_IOPOLICY_QD:
>   		return nvme_queue_depth_path(head);
> @@ -421,6 +424,9 @@ static bool nvme_available_path(struct nvme_ns_head *head)
>   {
>   	struct nvme_ns *ns;
>   
> +	if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
> +		return NULL;
> +
>   	list_for_each_entry_rcu(ns, &head->list, siblings) {
>   		if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags))
>   			continue;
> @@ -967,11 +973,15 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head)
>   {
>   	if (!head->disk)
>   		return;
> -	kblockd_schedule_work(&head->requeue_work);
> -	if (test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
> +	if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
>   		nvme_cdev_del(&head->cdev, &head->cdev_device);
>   		del_gendisk(head->disk);
>   	}
> +	/*
> +	 * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared
> +	 * to allow multipath to fail all I/O.
> +	 */
> +	kblockd_schedule_work(&head->requeue_work);

Not sure how this helps given that you don't wait for srcu to synchronize
before you kick the requeue.

>   }
>   
>   void nvme_mpath_remove_disk(struct nvme_ns_head *head)

Why do you need to clear NVME_NSHEAD_DISK_LIVE ? In the last posting you 
mentioned that ns_remove is stuck on srcu_synchronize? Can you explain 
why nvme_find_path is able to find a path given that it is already 
cleared NVME_NS_READY ? oris it nvme_available_path that is missing a 
check? Maybe can do with checking NVME_NS_READY instead?



More information about the Linux-nvme mailing list