[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work
Sagi Grimberg
sagi at grimberg.me
Thu Jul 23 20:11:54 EDT 2020
>> Fixes: 0d0b660f214d ("nvme: add ANA support")
>> Reported-by: Anton Eidelman <anton at lightbitslabs.com>
>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> I just tested nvme-5.9 and, after bisecting, found that this commit is
> hanging the nvme/031 test in blktests[1]. The test just rapidly creates,
> connects and destroys nvmet subsystems. The dmesg trace is below but I
> haven't really dug into root cause.
Thanks for reporting Logan!
The call to nvme_mpath_clear_ctrl_paths was delicate because it had
to do with an effects command coming in to a mpath device during
traffic and also controller reset.
But nothing afaict should prevent the scan_work from flushing before we
call nvme_mpath_clear_ctrl_paths, in fact, it even calls for a race
because the scan_work has the scan_lock taken.
Can you try?
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 35c39932c491..ac3fbc4005ad 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4105,6 +4105,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
struct nvme_ns *ns, *next;
LIST_HEAD(ns_list);
+ /* prevent racing with ns scanning */
+ flush_work(&ctrl->scan_work);
+
/*
* make sure to requeue I/O to all namespaces as these
* might result from the scan itself and must complete
@@ -4112,9 +4115,6 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
*/
nvme_mpath_clear_ctrl_paths(ctrl);
- /* prevent racing with ns scanning */
- flush_work(&ctrl->scan_work);
-
/*
* The dead states indicates the controller was not gracefully
* disconnected. In that case, we won't be able to flush any
data while
--
More information about the Linux-nvme
mailing list