[PATCH v3 2/2] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work
Logan Gunthorpe
logang at deltatee.com
Fri Jul 24 12:17:50 EDT 2020
On 2020-07-23 7:03 p.m., Sagi Grimberg wrote:
>
>> Actually, I think that the design was to unblock the scan_work and that
>> is why nvme_mpath_clear_ctrl_paths was placed before (as the comment
>> say).
>>
>> But looking at the implementation of nvme_mpath_clear_ctrl_paths, it's
>> completely unclear why it should take the scan_lock. It is just clearing
>> the paths..
>>
>> I think that the correct patch would be to just not take the scan_lock
>> and only take the namespaces_rwsem.
>
> OK, I was able to reproduce this on my setup.
>
> What was needed is that fabrics will allow I/O to pass in
> NVME_CTRL_DELETING, which needed this add-on:
> --
> nvme-fabrics: don't fast fail on ctrl state DELETING
>
> This is now an state that allows for I/O to be sent to the
> device, and when the device shall transition into
> NVME_CTRL_DELETING_NOIO we shall fail the I/O.
>
> Note that this is fine because the transport itself has
> a queue state to protect against queue access.
>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index a0ec40ab62ee..a9c1e3b4585e 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -182,7 +182,8 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
> static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct
> request *rq,
> bool queue_live)
> {
> - if (likely(ctrl->state == NVME_CTRL_LIVE))
> + if (likely(ctrl->state == NVME_CTRL_LIVE ||
> + ctrl->state == NVME_CTRL_DELETING))
> return true;
> return __nvmf_check_ready(ctrl, rq, queue_live);
> }
> --
>
> Logan,
>
> Can you verify that it works for you?
Yes, thanks, this fixes the issue for me.
> BTW, I'm still seriously suspicious on why nvme_mpath_clear_ctrl_paths
> is taking the scan_lock. It appears that it shouldn't. I'm tempted to
> remove it and see if anyone complains...
Not really sure myself, but I did a cursory look and don't see any
obvious reason why scan_lock needs to be taken there.
Logan
More information about the Linux-nvme
mailing list