[PATCH v2] nvme-multipath: fix possible hang in live ns resize with ANA access

Sagi Grimberg sagi at grimberg.me
Thu Sep 29 00:34:29 PDT 2022



On 9/29/22 04:32, Chao Leng wrote:
> 
> 
> On 2022/9/29 4:10, Sagi Grimberg wrote:
>> When we revalidate paths as part of ns size change (as of commit
>> e7d65803e2bb), it is possible that during the path revalidation, the
>> only paths that is IO capable (i.e. optimized/non-optimized) are the
>> ones that ns resize was not yet informed to the host, which will cause
>> inflight requests to be requeued (as we have available paths but none
>> are IO capable). These requests on the requeue list are waiting for
>> someone to resubmit them at some point.
>>
>> The IO capable paths will eventually notify the ns resize change to the
>> host, but there is nothing that will kick the requeue list to resubmit
>> the queued requests.
>>
>> Fix this by always kicking the requeue list, and if no IO capable path
>> exists, these requests will just end up being queued again.
>>
>> A typical log that indicates that IOs are requeued:
>> -- 
>> nvme nvme1: creating 4 I/O queues.
>> nvme nvme1: new ctrl: "testnqn1"
>> nvme nvme2: creating 4 I/O queues.
>> nvme nvme2: mapped 4/0/0 default/read/poll queues.
>> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
>> nvme nvme1: rescanning namespaces.
>> nvme1n1: detected capacity change from 2097152 to 4194304
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> block nvme1n1: no usable path - requeuing I/O
>> nvme nvme2: rescanning namespaces.
>> -- 
>>
>> Reported-by: Yogev Cohen <yogev at lightbitslabs.com>
>> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> ---
>> Changes from v1:
>> - fix commit msg body format
>> - follow reverse-xmas declaration pattern
>>
>>   drivers/nvme/host/multipath.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/multipath.c 
>> b/drivers/nvme/host/multipath.c
>> index 6ef497c75a16..1113139c9736 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -173,15 +173,17 @@ void nvme_mpath_revalidate_paths(struct nvme_ns 
>> *ns)
>>   {
>>       struct nvme_ns_head *head = ns->head;
>>       sector_t capacity = get_capacity(head->disk);
>> +    struct nvme_ns *n;
>>       int node;
>> -    list_for_each_entry_rcu(ns, &head->list, siblings) {
>> -        if (capacity != get_capacity(ns->disk))
>> -            clear_bit(NVME_NS_READY, &ns->flags);
>> +    list_for_each_entry_rcu(n, &head->list, siblings) {
>> +        if (capacity != get_capacity(n->disk))
>> +            clear_bit(NVME_NS_READY, &n->flags);
>>       }
>>       for_each_node(node)
>>           rcu_assign_pointer(head->current_path[node], NULL);
>> +    nvme_kick_requeue_lists(ns->ctrl);
> We just need to schedule the requeue_work of the head instead of all heads.
> we can do like this:
>    kblockd_schedule_work(&head->requeue_work);

Yes, you're right, that is simpler to do. I'll send a v3.



More information about the Linux-nvme mailing list