[PATCH 3/3] nvme: 'nvme disconnect' hangs after remapping namespaces
Hannes Reinecke
hare at kernel.org
Fri Sep 6 03:16:40 PDT 2024
During repetitive namespace map and unmap operations on the target
(disabling the namespace, changing the UUID, enabling it again)
the initial scan will hang as the target will be returning
PATH_ERROR and the I/O is constantly retried:
[<0>] folio_wait_bit_common+0x12a/0x310
[<0>] filemap_read_folio+0x97/0xd0
[<0>] do_read_cache_folio+0x108/0x390
[<0>] read_part_sector+0x31/0xa0
[<0>] read_lba+0xc5/0x160
[<0>] efi_partition+0xd9/0x8f0
[<0>] bdev_disk_changed+0x23d/0x6d0
[<0>] blkdev_get_whole+0x78/0xc0
[<0>] bdev_open+0x2c6/0x3b0
[<0>] bdev_file_open_by_dev+0xcb/0x120
[<0>] disk_scan_partitions+0x5d/0x100
[<0>] device_add_disk+0x402/0x420
[<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core]
[<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core]
[<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core]
[<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core]
[<0>] nvme_scan_work+0x1a3/0x490 [nvme_core]
Calling 'nvme disconnect' on controllers with these namespaces
will hang as the disconnect operation tries to flush scan_work:
[<0>] __flush_work+0x389/0x4b0
[<0>] nvme_remove_namespaces+0x4b/0x130 [nvme_core]
[<0>] nvme_do_delete_ctrl+0x72/0x90 [nvme_core]
[<0>] nvme_delete_ctrl_sync+0x2e/0x40 [nvme_core]
[<0>] nvme_sysfs_delete+0x35/0x40 [nvme_core]
[<0>] kernfs_fop_write_iter+0x13d/0x1b0
[<0>] vfs_write+0x404/0x510
before the namespaces are removed.
This patch sets the 'failfast_expired' bit for the controller
to cause all pending I/O to be failed, and the disconnect process
to complete.
Signed-off-by: Hannes Reinecke <hare at kernel.org>
---
drivers/nvme/host/core.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 651073280f6f..b968b672dcf8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4222,6 +4222,13 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
*/
nvme_mpath_clear_ctrl_paths(ctrl);
+ /*
+ * Mark the controller as 'failfast' to ensure all pending I/O
+ * is killed.
+ */
+ set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
+ nvme_kick_requeue_lists(ctrl);
+
/*
* Unquiesce io queues so any pending IO won't hang, especially
* those submitted from scan work
--
2.35.3
More information about the Linux-nvme
mailing list