[PATCH 3/3] nvme-multipath: stuck partition scan on inaccessible paths

Sagi Grimberg sagi at grimberg.me
Wed Sep 11 08:20:29 PDT 2024




On 11/09/2024 15:46, Hannes Reinecke wrote:
> When a path is switched to 'inaccessible' during partition scan
> triggered via device_add_disk() and we only have one path the
> system will be stuck as nvme_available_path() will always return
> 'true'. So I/O will never be completed and the system is stuck
> in device_add_disk():
>
>      [<0>] folio_wait_bit_common+0x12a/0x310
>      [<0>] filemap_read_folio+0x97/0xd0
>      [<0>] do_read_cache_folio+0x108/0x390
>      [<0>] read_part_sector+0x31/0xa0
>      [<0>] read_lba+0xc5/0x160
>      [<0>] efi_partition+0xd9/0x8f0
>      [<0>] bdev_disk_changed+0x23d/0x6d0
>      [<0>] blkdev_get_whole+0x78/0xc0
>      [<0>] bdev_open+0x2c6/0x3b0
>      [<0>] bdev_file_open_by_dev+0xcb/0x120
>      [<0>] disk_scan_partitions+0x5d/0x100
>      [<0>] device_add_disk+0x402/0x420
>      [<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core]
>      [<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core]
>      [<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core]
>      [<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core]
>      [<0>] nvme_scan_work+0x1a3/0x490 [nvme_core]
>
> This patch introduces a flag NVME_NSHEAD_FAIL_ON_LAST_PATH to
> cause nvme_available_path() to always return NULL, and with
> that I/O to be failed if the last path is unavailable. But
> we also need to requeue all pending I/Os whenever we have
> changed the ANA state as now even inaccessible ANA states
> influence I/O behaviour.
>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
>   drivers/nvme/host/multipath.c | 15 +++++++++++++++
>   drivers/nvme/host/nvme.h      |  1 +
>   2 files changed, 16 insertions(+)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index f72c5a6a2d8e..0373b4043eea 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -422,6 +422,10 @@ static bool nvme_available_path(struct nvme_ns_head *head)
>   	struct nvme_ns *ns;
>   
>   	if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags))
> +	    return NULL;
> +
> +	if (test_bit(NVME_NSHEAD_FAIL_ON_LAST_PATH, &head->flags) &&
> +	    list_is_singular(&head->list))
>   		return NULL;

This is wrong too Hannes.

If you added a live path, and when partition scan triggers your path becomes
inaccessible, but then after anatt it becomes optimized again. You just 
ended
up without discovering partitions...



More information about the Linux-nvme mailing list