[PATCH V2] nvme: mark ctrl as DEAD if removing from error recovery

Ming Lei ming.lei at redhat.com
Thu Jun 29 01:01:16 PDT 2023


On Thu, Jun 29, 2023 at 09:33:05AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 29, 2023 at 02:48:18PM +0800, Ming Lei wrote:
> > @@ -4054,8 +4055,14 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> >  	 * disconnected. In that case, we won't be able to flush any data while
> >  	 * removing the namespaces' disks; fail all the queues now to avoid
> >  	 * potentially having to clean up the failed sync later.
> > +	 *
> > +	 * If this removal happens during error recovering, resetting part
> > +	 * may not be started, or controller isn't be recovered completely,
> > +	 * so we have to treat controller as DEAD for avoiding IO hang since
> > +	 * queues can be left as frozen and quiesced.
> >  	 */
> > -	if (ctrl->state == NVME_CTRL_DEAD) {
> > +	if (ctrl->state == NVME_CTRL_DEAD ||
> > +	    ctrl->old_state != NVME_CTRL_LIVE) {
> >  		nvme_mark_namespaces_dead(ctrl);
> >  		nvme_unquiesce_io_queues(ctrl);
> 
> Thanks for the comment and style, but I really still think doing
> the state check was wrong to start with, and adding a check on the
> old state makes things significantly worse.  Can we try to brainstorm
> on how do this properly?

Removal comes during error recovery, and the old state is just for figuring
out this situation, and I think it is documented clearly, and same with
the change itself.

We need to fix it in one simple way for -stable and downstream, so I'd
suggest to apply the simple fix first.

> 
> I think we need to first figure out how to balance the quiesce/unquiesce
> calls, the placement of the nvme_mark_namespaces_dead call should
> be the simple part.

The root cause is that nvme driver takes 2 stage error recovery(teardown and
reset), both two are run from different contexts. The teardown part freezes
and quiesces queues, and reset stage does the unfreeze and unquiesce part.
Device removal can come any time, maybe before starting the reset, and maybe
during resetting, so it is hard to balance the two pair APIs if not calling
them in same context. And it has been one long-term issue.

I am working to move freeze to reset handler[1], which can balance
freeze API.

For quiesce API, I think it still need to be done unconditionally
when removing NSs.

But this kind of work can't be fix candidate for -stable or downstream cause
the change is bigger & more complicated.

[1] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@grimberg.me/T/#mfc96266b63eec3e4154f6843be72e5186a4055dc


Thanks, 
Ming




More information about the Linux-nvme mailing list