[PATCH RFC for-5.8-rc] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work
Christoph Hellwig
hch at lst.de
Mon Jun 29 02:49:35 EDT 2020
> + /*
> + * Controller deletion started, we may issue I/O, block and prevent
> + * the controller deletion process from completing
> + */
> + if (ctrl->state == NVME_CTRL_DELETE_START)
> + return;
> +
> /* No tagset on a live ctrl means IO queues could not created */
> if (ctrl->state != NVME_CTRL_LIVE || !ctrl->tagset)
Can we merge the checks into a single one?
> @@ -3913,6 +3932,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> if (ctrl->state == NVME_CTRL_DEAD)
> nvme_kill_queues(ctrl);
>
> + /* prevent mpath I/O before removing namespaces */
> + nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING);
So with the DEAD state above isn't this going to cause problems,
shouldn't this be:
if (ctrl->state == NVME_CTRL_DEAD)
nvme_kill_queues(ctrl);
else
nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING);
But even with that I'm not sure it does the right thing for the direct
call from the PCIe code.
Also I wonder about the state naming. Shouldn't NVME_CTRL_DELETE_START
stay as NVME_CTRL_DELETING and the new state could be
NVME_CTRL_NS_REMOVAL? or NVME_CTRL_DELETED? But with any name we'll
need to document the difference between the two removal states.
> + /*
> + * We don't treat NVME_CTRL_DELETE_START as a disabled path
> + * as we I/O should still be able to complete assuming that
> + * the controller is connected, otherwize it'll fail
> + * immediately and return to the requeue list.
> + */
This needs to run through a spell and grammar checker :)
More information about the Linux-nvme
mailing list