[PATCH V2] nvme: mark ctrl as DEAD if removing from error recovery
Christoph Hellwig
hch at lst.de
Thu Jun 29 00:33:05 PDT 2023
On Thu, Jun 29, 2023 at 02:48:18PM +0800, Ming Lei wrote:
> @@ -4054,8 +4055,14 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> * disconnected. In that case, we won't be able to flush any data while
> * removing the namespaces' disks; fail all the queues now to avoid
> * potentially having to clean up the failed sync later.
> + *
> + * If this removal happens during error recovering, resetting part
> + * may not be started, or controller isn't be recovered completely,
> + * so we have to treat controller as DEAD for avoiding IO hang since
> + * queues can be left as frozen and quiesced.
> */
> - if (ctrl->state == NVME_CTRL_DEAD) {
> + if (ctrl->state == NVME_CTRL_DEAD ||
> + ctrl->old_state != NVME_CTRL_LIVE) {
> nvme_mark_namespaces_dead(ctrl);
> nvme_unquiesce_io_queues(ctrl);
Thanks for the comment and style, but I really still think doing
the state check was wrong to start with, and adding a check on the
old state makes things significantly worse. Can we try to brainstorm
on how do this properly?
I think we need to first figure out how to balance the quiesce/unquiesce
calls, the placement of the nvme_mark_namespaces_dead call should
be the simple part.
More information about the Linux-nvme
mailing list