[PATCH v2] nvme-multipath: fix possible hang in live ns resize with ANA access

Chao Leng lengchao at huawei.com
Wed Sep 28 18:32:31 PDT 2022



On 2022/9/29 4:10, Sagi Grimberg wrote:
> When we revalidate paths as part of ns size change (as of commit
> e7d65803e2bb), it is possible that during the path revalidation, the
> only paths that is IO capable (i.e. optimized/non-optimized) are the
> ones that ns resize was not yet informed to the host, which will cause
> inflight requests to be requeued (as we have available paths but none
> are IO capable). These requests on the requeue list are waiting for
> someone to resubmit them at some point.
> 
> The IO capable paths will eventually notify the ns resize change to the
> host, but there is nothing that will kick the requeue list to resubmit
> the queued requests.
> 
> Fix this by always kicking the requeue list, and if no IO capable path
> exists, these requests will just end up being queued again.
> 
> A typical log that indicates that IOs are requeued:
> --
> nvme nvme1: creating 4 I/O queues.
> nvme nvme1: new ctrl: "testnqn1"
> nvme nvme2: creating 4 I/O queues.
> nvme nvme2: mapped 4/0/0 default/read/poll queues.
> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
> nvme nvme1: rescanning namespaces.
> nvme1n1: detected capacity change from 2097152 to 4194304
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> nvme nvme2: rescanning namespaces.
> --
> 
> Reported-by: Yogev Cohen <yogev at lightbitslabs.com>
> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> Changes from v1:
> - fix commit msg body format
> - follow reverse-xmas declaration pattern
> 
>   drivers/nvme/host/multipath.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 6ef497c75a16..1113139c9736 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -173,15 +173,17 @@ void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
>   {
>   	struct nvme_ns_head *head = ns->head;
>   	sector_t capacity = get_capacity(head->disk);
> +	struct nvme_ns *n;
>   	int node;
>   
> -	list_for_each_entry_rcu(ns, &head->list, siblings) {
> -		if (capacity != get_capacity(ns->disk))
> -			clear_bit(NVME_NS_READY, &ns->flags);
> +	list_for_each_entry_rcu(n, &head->list, siblings) {
> +		if (capacity != get_capacity(n->disk))
> +			clear_bit(NVME_NS_READY, &n->flags);
>   	}
>   
>   	for_each_node(node)
>   		rcu_assign_pointer(head->current_path[node], NULL);
> +	nvme_kick_requeue_lists(ns->ctrl);
We just need to schedule the requeue_work of the head instead of all heads.
we can do like this:
   kblockd_schedule_work(&head->requeue_work);
>   }
>   
>   static bool nvme_path_is_disabled(struct nvme_ns *ns)
> 



More information about the Linux-nvme mailing list