[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Sagi Grimberg sagi at grimberg.me
Thu Jul 23 20:11:54 EDT 2020


>> Fixes: 0d0b660f214d ("nvme: add ANA support")
>> Reported-by: Anton Eidelman <anton at lightbitslabs.com>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> I just tested nvme-5.9 and, after bisecting, found that this commit is
> hanging the nvme/031 test in blktests[1]. The test just rapidly creates,
> connects and destroys nvmet subsystems. The dmesg trace is below but I
> haven't really dug into root cause.

Thanks for reporting Logan!

The call to nvme_mpath_clear_ctrl_paths was delicate because it had
to do with an effects command coming in to a mpath device during
traffic and also controller reset.

But nothing afaict should prevent the scan_work from flushing before we
call nvme_mpath_clear_ctrl_paths, in fact, it even calls for a race
because the scan_work has the scan_lock taken.

Can you try?
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 35c39932c491..ac3fbc4005ad 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4105,6 +4105,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         struct nvme_ns *ns, *next;
         LIST_HEAD(ns_list);

+       /* prevent racing with ns scanning */
+       flush_work(&ctrl->scan_work);
+
         /*
          * make sure to requeue I/O to all namespaces as these
          * might result from the scan itself and must complete
@@ -4112,9 +4115,6 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
          */
         nvme_mpath_clear_ctrl_paths(ctrl);

-       /* prevent racing with ns scanning */
-       flush_work(&ctrl->scan_work);
-
         /*
          * The dead states indicates the controller was not gracefully
          * disconnected. In that case, we won't be able to flush any 
data while
--



More information about the Linux-nvme mailing list