[PATCHv2] nvme-mpath: delete disk after last connection

Tue Apr 20 14:19:10 BST 2021

On 4/20/21 10:05 AM, Christoph Hellwig wrote:
> On Fri, Apr 16, 2021 at 08:24:11AM +0200, Hannes Reinecke wrote:
>> With the proposed patch, the following messages appear:
>>
>>   [  227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device.
>>   [  227.516807] md/raid1:md0: Operation continuing on 1 devices.
> 
> So how is this going to work for e.g. a case where the device
> disappear due to resets or fabrics connection problems?  This now
> directly teards down the device.
> 
Yes, that is correct; the nshead will be removed once the last path is 
_removed_. But key point here is that once the system finds itself in 
that situation it's impossible to recover, as the refcounts are messed.
Even a manual connect call with the same parameter will _not_ restore 
operation, but rather result in a new namespace.
So with this patch we change from stalled I/O (where the user has to
reboot the machine to restore operation) to I/O errors (giving the user 
at least a _chance_ to recover).

I can easily consider adding a 'queue_if_no_path' option to allow I/O to 
be held even once all paths are disconnected, but that will be enother 
patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer