[PATCH 1/2] nvme-tcp: avoid race between nvme scan and reset

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Thu Jun 12 21:10:37 PDT 2025


On Jun 04, 2025 / 11:17, Shinichiro Kawasaki wrote:
...
> On Jun 04, 2025 / 10:10, Sagi Grimberg wrote:
> ...
> > My preference would be to allow nvme to unquiesce queues that were not
> > previously quiesced (just
> > like it historically was) instead of having to block a controller reset
> > until the scan_work is completed (which
> > is admin I/O dependent, and may get stuck until admin timeout, which can be
> > changed by the user for 60
> > minutes or something arbitrarily long btw).
> > 
> > How about something like this patch instead:
> > --
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index c2697db59109..74f3ad16e812 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -327,8 +327,10 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> >         bool run_queue = false;
> > 
> >         spin_lock_irqsave(&q->queue_lock, flags);
> > -       if (WARN_ON_ONCE(q->quiesce_depth <= 0)) {
> > -               ;
> > +       if (q->quiesce_depth <= 0) {
> > +               printk(KERN_DEBUG
> > +                       "dev %s: unquiescing a non-quiesced queue,
> > expected?\n",
> > +                       q->disk ? q->disk->disk_name : "?", );
> >         } else if (!--q->quiesce_depth) {
> >                 blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> >                 run_queue = true;
> > --
> 
> The WARN was introduced with the commit e70feb8b3e68 ("blk-mq: support
> concurrent queue quiesce/unquiesce") that Ming authored. Ming, may I
> ask your comment on the suggestion by Sagi?
> 
> In case the WARN will be left as it is, blktests can ignore it by adding the
> line below to the test case:
> 
>   DMESG_FILTER="grep --invert-match blk_mq_unquiesce_queue"
> 
> Said that, I think Sagi's solution will be cleaner.

FYI, I tried to recreate the WARN blk_mq_unquiesce_queue() using the kernel
v6.16-rc1, but it was not recreated. AFAIK, the kernel changes between v6.15 and
v6.16-rc1 do not address the WARN, so I'm guessing the WARN just disappeared
because of timing changes. Anyway, I suggest to put low priority for this
problem. Sagi, Hannes, thanks for your actions.



More information about the Linux-nvme mailing list