[PATCHv2 0/2] nvme-multipath: fix deadlock in device_add_disk()

Hannes Reinecke hare at kernel.org
Tue Oct 8 06:57:27 PDT 2024


From: Hannes Reinecke <hare at suse.de>

Hi all,

I'm having a testcase which repeatedly disables namespaces on the target
assigning new UUID (to simulate namespace remapping) and enable that
namespace again.
To throw in more fun these namespaces have their ANA group ID changes
to simulate namespace moving around in a cluster, where only the paths
local to the cluster node are active, and all other paths are inaccessible.

Essentially it's doing something like:

echo 0 > ${ns}/enable
<random delay>
echo "<dev>" > ${ns}/device_path
echo "<grpid>" > ${ns}/ana_grpid
uuidgen > ${ns}/device_uuid
echo 1 > ${ns}/enable

ie a similar testcase than the previous patchset, only this time I'm
just doing an 'enable/disable' bit without removing the namespace from
the target.
This is causing lockups in device_add_disk(), as the partition scan is
constantly retrying I/O and never completes.

With this patchset I/O errors during partition scan will never be
retried but will cause nvme_mpath_set_live() to fail.
This allows us to retry nvme_mpath_set_live() on the next rescan
to fixup the situation.

As usual, comments and reviews are welcome.

Changes to the original submission:
- Drop patch to simplify the loop in nvme_update_ana_state()
- Rework patches to return I/O errors during partition scan

Hannes Reinecke (2):
  nvme: propagate I/O errors during partition scan
  nvme-multipath: retry partition scan on errors

 drivers/nvme/host/core.c      | 26 ++++++++++++++++++------
 drivers/nvme/host/multipath.c | 38 +++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h      |  2 ++
 3 files changed, 60 insertions(+), 6 deletions(-)

-- 
2.35.3




More information about the Linux-nvme mailing list