[PATCH 4/4] nvme: redirect commands on dying queue

Tue Aug 18 02:32:39 EDT 2020

On Mon, Aug 17, 2020 at 11:23:22AM -0400, Mike Snitzer wrote:
> On Mon, Aug 17 2020 at  4:15am -0400,
> Christoph Hellwig <hch at lst.de> wrote:
> 
> > From: Chao Leng <lengchao at huawei.com>
> > 
> > If a command send through nvme-multipath failed on a dying queue, resend it
> > on another path.
> > 
> > Signed-off-by: Chao Leng <lengchao at huawei.com>
> > [hch: rebased on top of the completion refactoring]
> > Signed-off-by: Christoph Hellwig <hch at lst.de>
> > Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> 
> Did we ever learn from Chao what the original issue was?  Deciding to
> failover on completion because blk_queue_dying(), without any other
> insight, is definitely new to me.

Yes.  Basically the controller is going away after returning a retryable
error.

> But this looks fine, just in general such blk_queue_dying() checks are
> pretty racey right?  Feels like this might paper over something else but
> without knowing more:

But I guess the race doesn't matter - if we lose it, ->queue_rq will
fail and the submission path will pick another path as well.