[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Logan Gunthorpe logang at deltatee.com
Fri Jul 24 12:17:50 EDT 2020



On 2020-07-23 7:03 p.m., Sagi Grimberg wrote:
> 
>> Actually, I think that the design was to unblock the scan_work and that
>> is why nvme_mpath_clear_ctrl_paths was placed before (as the comment
>> say).
>>
>> But looking at the implementation of nvme_mpath_clear_ctrl_paths, it's
>> completely unclear why it should take the scan_lock. It is just clearing
>> the paths..
>>
>> I think that the correct patch would be to just not take the scan_lock
>> and only take the namespaces_rwsem.
> 
> OK, I was able to reproduce this on my setup.
> 
> What was needed is that fabrics will allow I/O to pass in
> NVME_CTRL_DELETING, which needed this add-on:
> -- 
>     nvme-fabrics: don't fast fail on ctrl state DELETING
> 
>     This is now an state that allows for I/O to be sent to the
>     device, and when the device shall transition into
>     NVME_CTRL_DELETING_NOIO we shall fail the I/O.
> 
>     Note that this is fine because the transport itself has
>     a queue state to protect against queue access.
> 
>     Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> 
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index a0ec40ab62ee..a9c1e3b4585e 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -182,7 +182,8 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
>  static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct
> request *rq,
>                 bool queue_live)
>  {
> -       if (likely(ctrl->state == NVME_CTRL_LIVE))
> +       if (likely(ctrl->state == NVME_CTRL_LIVE ||
> +                  ctrl->state == NVME_CTRL_DELETING))
>                 return true;
>         return __nvmf_check_ready(ctrl, rq, queue_live);
>  }
> -- 
> 
> Logan,
> 
> Can you verify that it works for you?

Yes, thanks, this fixes the issue for me.

> BTW, I'm still seriously suspicious on why nvme_mpath_clear_ctrl_paths
> is taking the scan_lock. It appears that it shouldn't. I'm tempted to
> remove it and see if anyone complains...

Not really sure myself, but I did a cursory look and don't see any
obvious reason why scan_lock needs to be taken there.

Logan



More information about the Linux-nvme mailing list