[PATCH v2] nvme-multipath: fix possible hang in live ns resize with ANA access
Sagi Grimberg
sagi at grimberg.me
Thu Sep 29 00:34:29 PDT 2022
On 9/29/22 04:32, Chao Leng wrote:
>
>
> On 2022/9/29 4:10, Sagi Grimberg wrote:
>> When we revalidate paths as part of ns size change (as of commit
>> e7d65803e2bb), it is possible that during the path revalidation, the
>> only paths that is IO capable (i.e. optimized/non-optimized) are the
>> ones that ns resize was not yet informed to the host, which will cause
>> inflight requests to be requeued (as we have available paths but none
>> are IO capable). These requests on the requeue list are waiting for
>> someone to resubmit them at some point.
>>
>> The IO capable paths will eventually notify the ns resize change to the
>> host, but there is nothing that will kick the requeue list to resubmit
>> the queued requests.
>>
>> Fix this by always kicking the requeue list, and if no IO capable path
>> exists, these requests will just end up being queued again.
>>
>> A typical log that indicates that IOs are requeued:
>> --
>> nvme nvme1: creating 4 I/O queues.
>> nvme nvme1: new ctrl: "testnqn1"
>> nvme nvme2: creating 4 I/O queues.
>> nvme nvme2: mapped 4/0/0 default/read/poll queues.
>> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
>> nvme nvme1: rescanning namespaces.
>> nvme1n1: detected capacity change from 2097152 to 4194304
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> nvme nvme2: rescanning namespaces.
>> --
>>
>> Reported-by: Yogev Cohen <yogev at lightbitslabs.com>
>> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>> Changes from v1:
>> - fix commit msg body format
>> - follow reverse-xmas declaration pattern
>>
>> drivers/nvme/host/multipath.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/multipath.c
>> b/drivers/nvme/host/multipath.c
>> index 6ef497c75a16..1113139c9736 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -173,15 +173,17 @@ void nvme_mpath_revalidate_paths(struct nvme_ns
>> *ns)
>> {
>> struct nvme_ns_head *head = ns->head;
>> sector_t capacity = get_capacity(head->disk);
>> + struct nvme_ns *n;
>> int node;
>> - list_for_each_entry_rcu(ns, &head->list, siblings) {
>> - if (capacity != get_capacity(ns->disk))
>> - clear_bit(NVME_NS_READY, &ns->flags);
>> + list_for_each_entry_rcu(n, &head->list, siblings) {
>> + if (capacity != get_capacity(n->disk))
>> + clear_bit(NVME_NS_READY, &n->flags);
>> }
>> for_each_node(node)
>> rcu_assign_pointer(head->current_path[node], NULL);
>> + nvme_kick_requeue_lists(ns->ctrl);
> We just need to schedule the requeue_work of the head instead of all heads.
> we can do like this:
> kblockd_schedule_work(&head->requeue_work);
Yes, you're right, that is simpler to do. I'll send a v3.
More information about the Linux-nvme
mailing list