[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work
Sagi Grimberg
sagi at grimberg.me
Thu Jul 23 20:26:38 EDT 2020
>>> Fixes: 0d0b660f214d ("nvme: add ANA support")
>>> Reported-by: Anton Eidelman <anton at lightbitslabs.com>
>>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>> I just tested nvme-5.9 and, after bisecting, found that this commit is
>> hanging the nvme/031 test in blktests[1]. The test just rapidly creates,
>> connects and destroys nvmet subsystems. The dmesg trace is below but I
>> haven't really dug into root cause.
>
> Thanks for reporting Logan!
>
> The call to nvme_mpath_clear_ctrl_paths was delicate because it had
> to do with an effects command coming in to a mpath device during
> traffic and also controller reset.
Actually, I think I'm confusing, the original report was from you Logan.
> But nothing afaict should prevent the scan_work from flushing before we
> call nvme_mpath_clear_ctrl_paths, in fact, it even calls for a race
> because the scan_work has the scan_lock taken.
Actually, I think that the design was to unblock the scan_work and that
is why nvme_mpath_clear_ctrl_paths was placed before (as the comment
say).
But looking at the implementation of nvme_mpath_clear_ctrl_paths, it's
completely unclear why it should take the scan_lock. It is just clearing
the paths..
I think that the correct patch would be to just not take the scan_lock
and only take the namespaces_rwsem.
So a more appropriate patch would be:
--
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 900b35d47ec7..83beffddbc0a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -156,13 +156,11 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl
*ctrl)
{
struct nvme_ns *ns;
- mutex_lock(&ctrl->scan_lock);
down_read(&ctrl->namespaces_rwsem);
list_for_each_entry(ns, &ctrl->namespaces, list)
if (nvme_mpath_clear_current_path(ns))
kblockd_schedule_work(&ns->head->requeue_work);
up_read(&ctrl->namespaces_rwsem);
- mutex_unlock(&ctrl->scan_lock);
}
--
I'll also prepare a setup and run the test.
More information about the Linux-nvme
mailing list