[PATCH 2/2] nvme: delete disk when last path is gone

Hannes Reinecke hare at suse.de
Thu Feb 25 03:37:03 EST 2021


On 2/24/21 11:40 PM, Sagi Grimberg wrote:
> 
>> The multipath code currently deletes the disk only after all references
>> to it are dropped rather than when the last path to that disk is lost.
>> This has been reported to cause problems with some use cases like MD 
>> RAID.
> 
> What is the exact problem?
> 
> Can you describe what the problem you see now and what you expect
> to see (unrelated to patch #1)?
> 
The problem is a difference in behaviour between multipathed and 
non-multipathed namespaces (ie whether 'CMIC' is set or not).
If the CMIC bit is _not_ set, the disk device will be removed once
the controller is gone; if the CMIC bit is set the disk device will be 
retained, and only removed once the last _reference_ is dropped.

This is causing customer issues, as some vendors produce nearly 
identical PCI NVMe devices, which differ in the CMIC bit.
So depending on which device the customer uses, he might be getting on 
or the other behaviour.
And this is causing issues when said customer deploys MD RAID on thems;
with one set of devices PCI hotplug works, with the other set of devices 
it doesn't.

>> This patch implements an alternative behaviour of deleting the disk when
>> the last path is gone, ie the same behaviour as non-multipathed nvme
>> devices.
> 
> But we also don't remove the non-multipath'd nvme device until the
> last reference drops (e.g. if you have a mounted filesystem on top).
> 
Au contraire.

When doing PCI hotplug the controller is removed (in the non-multipathed 
case), and calling 'put_disk()' during nvme_free_ns().
When doing PCI hotplug in the non-multipathed case, the controller is 
removed, too, but put_disk() is only called on the namespace itself; the 
'nshead' disk is still kept around, and put_disk() on the 'nshead' disk 
is only called after the last reference is dropped.

> This would be the equivalent to running raid on top of dm-mpath on
> top of scsi devices right? And if all the mpath device nodes go away
> the mpath device is deleted even if it has an open reference to it?
> 
See above. The prime motivator behind this patch is to get equivalent 
behaviour between multipathed and non-multipathed devices.
It just so happens that MD RAID exercises this particular issue.

>> The new behaviour will be selected with the 'fail_if_no_path'
>> attribute, as returning it's arguably the same functionality.
> 
> But its not the same functionality.

Agreed. But as the first patch will be dropped (see my other mail) I'll 
be redoing the patchset anyway.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



More information about the Linux-nvme mailing list