[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Sagi Grimberg sagi at grimberg.me
Thu Jul 23 20:26:38 EDT 2020


>>> Fixes: 0d0b660f214d ("nvme: add ANA support")
>>> Reported-by: Anton Eidelman <anton at lightbitslabs.com>
>>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> I just tested nvme-5.9 and, after bisecting, found that this commit is
>> hanging the nvme/031 test in blktests[1]. The test just rapidly creates,
>> connects and destroys nvmet subsystems. The dmesg trace is below but I
>> haven't really dug into root cause.
> 
> Thanks for reporting Logan!
> 
> The call to nvme_mpath_clear_ctrl_paths was delicate because it had
> to do with an effects command coming in to a mpath device during
> traffic and also controller reset.

Actually, I think I'm confusing, the original report was from you Logan.

> But nothing afaict should prevent the scan_work from flushing before we
> call nvme_mpath_clear_ctrl_paths, in fact, it even calls for a race
> because the scan_work has the scan_lock taken.

Actually, I think that the design was to unblock the scan_work and that
is why nvme_mpath_clear_ctrl_paths was placed before (as the comment
say).

But looking at the implementation of nvme_mpath_clear_ctrl_paths, it's
completely unclear why it should take the scan_lock. It is just clearing
the paths..

I think that the correct patch would be to just not take the scan_lock
and only take the namespaces_rwsem.

So a more appropriate patch would be:
--
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 900b35d47ec7..83beffddbc0a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -156,13 +156,11 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl 
*ctrl)
  {
         struct nvme_ns *ns;

-       mutex_lock(&ctrl->scan_lock);
         down_read(&ctrl->namespaces_rwsem);
         list_for_each_entry(ns, &ctrl->namespaces, list)
                 if (nvme_mpath_clear_current_path(ns))
                         kblockd_schedule_work(&ns->head->requeue_work);
         up_read(&ctrl->namespaces_rwsem);
-       mutex_unlock(&ctrl->scan_lock);
  }
--

I'll also prepare a setup and run the test.



More information about the Linux-nvme mailing list