[PATCH RFC for-5.8-rc] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Christoph Hellwig hch at lst.de
Mon Jun 29 02:49:35 EDT 2020


> +	/*
> +	 * Controller deletion started, we may issue I/O, block and prevent
> +	 * the controller deletion process from completing
> +	 */
> +	if (ctrl->state == NVME_CTRL_DELETE_START)
> +		return;
> +
>  	/* No tagset on a live ctrl means IO queues could not created */
>  	if (ctrl->state != NVME_CTRL_LIVE || !ctrl->tagset)

Can we merge the checks into a single one?

> @@ -3913,6 +3932,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	if (ctrl->state == NVME_CTRL_DEAD)
>  		nvme_kill_queues(ctrl);
>  
> +	/* prevent mpath I/O before removing namespaces */
> +	nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING);

So with the DEAD state above isn't this going to cause problems,
shouldn't this be:

	if (ctrl->state == NVME_CTRL_DEAD)
		nvme_kill_queues(ctrl);
	else
		nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING);

But even with that I'm not sure it does the right thing for the direct
call from the PCIe code.

Also I wonder about the state naming.  Shouldn't NVME_CTRL_DELETE_START
stay as NVME_CTRL_DELETING and the new state could be
NVME_CTRL_NS_REMOVAL? or NVME_CTRL_DELETED?  But with any name we'll
need to document the difference between the two removal states.

> +	/*
> +	 * We don't treat NVME_CTRL_DELETE_START as a disabled path
> +	 * as we I/O should still be able to complete assuming that
> +	 * the controller is connected, otherwize it'll fail
> +	 * immediately and return to the requeue list.
> +	 */

This needs to run through a spell and grammar checker :)



More information about the Linux-nvme mailing list