[PATCH 1/2] nvme-tcp: avoid race between nvme scan and reset

Thu Jun 12 21:56:40 PDT 2025

On Wed, Jun 04, 2025 at 11:17:05AM +0000, Shinichiro Kawasaki wrote:
> Cc+: Ming,
> 
> Hi Sagi, thanks for the background explanation and the suggestion.
> 
> On Jun 04, 2025 / 10:10, Sagi Grimberg wrote:
> ...
> > > This is a problem. We are flushing a work that is IO bound, which can
> > > take a long time to complete.
> > > Up until now, we have deliberately avoided introducing dependency
> > > between reset forward progress
> > > and scan work IO to completely finish.
> > > 
> > > I would like to keep it this way.
> > > 
> > > BTW, this is not TCP specific.
> 
> I see. The blktests test case nvme/063 is dedicated to tcp transport, so that's
> why it was reported for the TCP transport.
> 
> > 
> > blk_mq_unquiesce_queue is still very much safe to call as many times as we
> > want.
> > The only thing that comes in the way is this pesky WARN. How about we make
> > it go away and have
> > a debug print instead?
> > 
> > My preference would be to allow nvme to unquiesce queues that were not
> > previously quiesced (just
> > like it historically was) instead of having to block a controller reset
> > until the scan_work is completed (which
> > is admin I/O dependent, and may get stuck until admin timeout, which can be
> > changed by the user for 60
> > minutes or something arbitrarily long btw).
> > 
> > How about something like this patch instead:
> > --
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index c2697db59109..74f3ad16e812 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -327,8 +327,10 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> >         bool run_queue = false;
> > 
> >         spin_lock_irqsave(&q->queue_lock, flags);
> > -       if (WARN_ON_ONCE(q->quiesce_depth <= 0)) {
> > -               ;
> > +       if (q->quiesce_depth <= 0) {
> > +               printk(KERN_DEBUG
> > +                       "dev %s: unquiescing a non-quiesced queue,
> > expected?\n",
> > +                       q->disk ? q->disk->disk_name : "?", );
> >         } else if (!--q->quiesce_depth) {
> >                 blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> >                 run_queue = true;
> > --
> 
> The WARN was introduced with the commit e70feb8b3e68 ("blk-mq: support
> concurrent queue quiesce/unquiesce") that Ming authored. Ming, may I
> ask your comment on the suggestion by Sagi?

I think it is bad to use one standard block layer API in this unbalanced way,
that is why WARN_ON_ONCE() is added. We shouldn't encourage driver to use
APIs in this way.

Question is why nvme have to unquiesce one non-quiesced queue?

Thanks,
Ming