[PATCH 3/3] nvme-multipath: skip inaccessible paths during partition scan

Hannes Reinecke hare at suse.de
Wed Sep 11 05:04:11 PDT 2024


On 9/11/24 13:24, Sagi Grimberg wrote:
> Hannes, if this patch fixes a bug, please phrase the title to reflect that.
> 
> 
> On 11/09/2024 12:51, Hannes Reinecke wrote:
>> When a path is switched to 'inaccessible' during partition scan
>> triggered via device_add_disk() and we only have one path the
>> system will be stuck as nvme_available_path() will always return
>> 'true'. So I/O will never be completed and the system is stuck
>> in device_add_disk().
>> This patch introduces a flag NVME_NSHEAD_DISABLE_QUEUEING to
>> cause nvme_available_path() to always return NULL, and with
>> that I/O to be failed if all paths are unavailable.
> 
> So what will happen if a new device comes along, and the first
> path it connects to is 'inaccessible'? What will scan partitions?
> 
No, it won't. But it won't do that today, neither, as we never call
nvme_mpath_set_disk_live():
nvme_mpath_add_disk()
-> nvme_update_ns_ana_state():
         if (nvme_state_is_live(ns->ana_state) &&
             nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
                 nvme_mpath_set_live(ns);

> I don't think that this approach is the right one.
> Effectively, there is a semantic decision here, is 'inaccessible' a
> temporary state or not, if it is we should queue IO knowing that
> it may change in the future, and if not, we should fail it.
> 
Yes.

I would argue that we should never requeue I/O if it's triggered
from partition scan, ie from within device_add_disk().
Reasoning is as follows:

We never call nvme_mpath_set_live() during initial scan if all paths are 
inaccessible.
So when a path is 'optimized' / 'non-optimized' and the path state 
changes to 'inaccessible' while device_add_disk() is running, we are
perfectly fine to disable the device (and kill all I/O), as this is
what would have happened if the path had been inaccessible originally.

> ANA inaccessible is semantically temporary. Its just that your test
> treats it as permanent. For this case you have fast_io_fail_tmo, which is
> designed to give up also in cases where there is hope that any path will
> become online in the future.
> 
It's not just ANA inaccessible. We're facing a similar problem if the
target returns PATH_ERROR during scanning; then we're constantly failing
over I/O to the next path, returning PATH_ERROR, failing over to the 
next path, ...

> I do agree that if the user disconnects the last path, should it be 
> inaccessible
> or not, it should succeed, and cause the queued IO to fail immediately.

I'll check if that would work for my testcases.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




More information about the Linux-nvme mailing list