[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Sagi Grimberg sagi at grimberg.me
Thu Jul 23 21:03:30 EDT 2020


> Actually, I think that the design was to unblock the scan_work and that
> is why nvme_mpath_clear_ctrl_paths was placed before (as the comment
> say).
> 
> But looking at the implementation of nvme_mpath_clear_ctrl_paths, it's
> completely unclear why it should take the scan_lock. It is just clearing
> the paths..
> 
> I think that the correct patch would be to just not take the scan_lock
> and only take the namespaces_rwsem.

OK, I was able to reproduce this on my setup.

What was needed is that fabrics will allow I/O to pass in
NVME_CTRL_DELETING, which needed this add-on:
--
     nvme-fabrics: don't fast fail on ctrl state DELETING

     This is now an state that allows for I/O to be sent to the
     device, and when the device shall transition into
     NVME_CTRL_DELETING_NOIO we shall fail the I/O.

     Note that this is fine because the transport itself has
     a queue state to protect against queue access.

     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a0ec40ab62ee..a9c1e3b4585e 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -182,7 +182,8 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
  static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct 
request *rq,
                 bool queue_live)
  {
-       if (likely(ctrl->state == NVME_CTRL_LIVE))
+       if (likely(ctrl->state == NVME_CTRL_LIVE ||
+                  ctrl->state == NVME_CTRL_DELETING))
                 return true;
         return __nvmf_check_ready(ctrl, rq, queue_live);
  }
--

Logan,

Can you verify that it works for you?

BTW, I'm still seriously suspicious on why nvme_mpath_clear_ctrl_paths
is taking the scan_lock. It appears that it shouldn't. I'm tempted to
remove it and see if anyone complains...



More information about the Linux-nvme mailing list