[PATCHv6] nvme: allow to re-attach namespaces after all paths are down

Hannes Reinecke hare at suse.de
Mon Jun 21 23:31:54 PDT 2021


On 6/21/21 8:13 PM, Sagi Grimberg wrote:
> 
> 
> On 6/9/21 8:01 AM, Hannes Reinecke wrote:
>> We should only remove the ns head from the list of heads per
>> subsystem if the reference count drops to zero. That cleans up
>> reference counting, and allows us to call del_gendisk() once the last
>> path is removed (as then the ns_head should be removed anyway).
>> As this introduces a (theoretical) race condition where I/O might have
>> been requeued before the last path went down we also should be checking
>> if the gendisk is still present in nvme_ns_head_submit_bio(),
>> and failing I/O if so.
>>
>> Changes to v5:
>> - Synchronize between nvme_init_ns_head() and 
>> nvme_mpath_check_last_path()
>> - Check for removed gendisk in nvme_ns_head_submit_bio()
>> Changes to v4:
>> - Call del_gendisk() in nvme_mpath_check_last_path() to avoid deadlock
>> Changes to v3:
>> - Simplify if() clause to detect duplicate namespaces
>> Changes to v2:
>> - Drop memcpy() statement
>> Changes to v1:
>> - Always check NSIDs after reattach
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>>   drivers/nvme/host/core.c      |  9 ++++-----
>>   drivers/nvme/host/multipath.c | 30 +++++++++++++++++++++++++-----
>>   drivers/nvme/host/nvme.h      | 11 ++---------
>>   3 files changed, 31 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 177cae44b612..6d7c2958b3e2 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -566,6 +566,9 @@ static void nvme_free_ns_head(struct kref *ref)
>>       struct nvme_ns_head *head =
>>           container_of(ref, struct nvme_ns_head, ref);
>> +    mutex_lock(&head->subsys->lock);
>> +    list_del_init(&head->entry);
>> +    mutex_unlock(&head->subsys->lock);
>>       nvme_mpath_remove_disk(head);
>>       ida_simple_remove(&head->subsys->ns_ida, head->instance);
>>       cleanup_srcu_struct(&head->srcu);
>> @@ -3806,8 +3809,6 @@ static void nvme_alloc_ns(struct nvme_ctrl 
>> *ctrl, unsigned nsid,
>>    out_unlink_ns:
>>       mutex_lock(&ctrl->subsys->lock);
>>       list_del_rcu(&ns->siblings);
>> -    if (list_empty(&ns->head->list))
>> -        list_del_init(&ns->head->entry);
>>       mutex_unlock(&ctrl->subsys->lock);
>>       nvme_put_ns_head(ns->head);
>>    out_free_queue:
>> @@ -3828,8 +3829,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>       mutex_lock(&ns->ctrl->subsys->lock);
>>       list_del_rcu(&ns->siblings);
>> -    if (list_empty(&ns->head->list))
>> -        list_del_init(&ns->head->entry);
>>       mutex_unlock(&ns->ctrl->subsys->lock);
>>       synchronize_rcu(); /* guarantee not available in head->list */
>> @@ -3849,7 +3848,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>       list_del_init(&ns->list);
>>       up_write(&ns->ctrl->namespaces_rwsem);
>> -    nvme_mpath_check_last_path(ns);
>> +    nvme_mpath_check_last_path(ns->head);
>>       nvme_put_ns(ns);
>>   }
>> diff --git a/drivers/nvme/host/multipath.c 
>> b/drivers/nvme/host/multipath.c
>> index 23573fe3fc7d..31153f6ec582 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -266,6 +266,8 @@ inline struct nvme_ns *nvme_find_path(struct 
>> nvme_ns_head *head)
>>       int node = numa_node_id();
>>       struct nvme_ns *ns;
>> +    if (!(head->disk->flags & GENHD_FL_UP))
>> +        return NULL;
>>       ns = srcu_dereference(head->current_path[node], &head->srcu);
>>       if (unlikely(!ns))
>>           return __nvme_find_path(head, node);
>> @@ -281,6 +283,8 @@ static bool nvme_available_path(struct 
>> nvme_ns_head *head)
>>   {
>>       struct nvme_ns *ns;
>> +    if (!(head->disk->flags & GENHD_FL_UP))
>> +        return false;
> 
> nvme_available_path should have no business looking at the head gendisk,
> it should just understand if a PATH (a.k.a a controller) exists.
> 
Agreed. I was only overly cautious here; will be dropping this check.

> IMO, the fact that it does should tell that we should take a step back
> and think about this. We are trying to keep an zombie nshead around
> just for the possibility the host will reconnect (not as part of
> error recovery, but as a brand new connect). Why shouldn't we just
> remove it and restore it as a brand new nshead when the host attaches
> again?
> 
This patch has now evolved quite a bit, and in fact diverged slightly 
from the description. The original intent indeed was to keep the nshead 
around until the last reference drops, such that if a controller gets 
reattached it will be able to connect the namespaces to the correct 
(existing) ns_head.
However, as it turned out this was just a band-aid, and the real fix is 
to get the reference counts between 'struct ns' and 'struct ns_head' 
correct: if the last path to a ns_head drops, we should be removing the 
ns_head by calling del_gendisk() and removing it from the list of ns_heads.

As noted by Keith the first part is done correctly in this patch (namely 
del_gendisk() is called when the last path drops), but the second bit of 
detaching it from the list of ns_heads is _not_ done correctly.
Both should be happening at the same time to avoid any race conditions.

Will be sending an updated patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



More information about the Linux-nvme mailing list