[PATCHv2] nvme-mpath: delete disk after last connection
Hannes Reinecke
hare at suse.de
Tue Apr 20 14:19:10 BST 2021
On 4/20/21 10:05 AM, Christoph Hellwig wrote:
> On Fri, Apr 16, 2021 at 08:24:11AM +0200, Hannes Reinecke wrote:
>> With the proposed patch, the following messages appear:
>>
>> [ 227.516807] md/raid1:md0: Disk failure on nvme3n1, disabling device.
>> [ 227.516807] md/raid1:md0: Operation continuing on 1 devices.
>
> So how is this going to work for e.g. a case where the device
> disappear due to resets or fabrics connection problems? This now
> directly teards down the device.
>
Yes, that is correct; the nshead will be removed once the last path is
_removed_. But key point here is that once the system finds itself in
that situation it's impossible to recover, as the refcounts are messed.
Even a manual connect call with the same parameter will _not_ restore
operation, but rather result in a new namespace.
So with this patch we change from stalled I/O (where the user has to
reboot the machine to restore operation) to I/O errors (giving the user
at least a _chance_ to recover).
I can easily consider adding a 'queue_if_no_path' option to allow I/O to
be held even once all paths are disconnected, but that will be enother
patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list