[bug report] nvme-tcp poll queue causes busy loop and 100% CPU in nvme_tcp_poll() on the latest linux-block/for-next
Ming Lei
ming.lei at redhat.com
Wed Jan 21 19:48:06 PST 2026
On Wed, Jan 21, 2026 at 10:15:56PM +0800, Yi Zhang wrote:
> On Wed, Jan 21, 2026 at 5:03 PM Yi Zhang <yi.zhang at redhat.com> wrote:
> >
> > On Wed, Jan 21, 2026 at 4:53 PM Ming Lei <ming.lei at redhat.com> wrote:
> > >
> > > On Wed, Jan 21, 2026 at 10:50:00AM +0800, Yi Zhang wrote:
> > > > Hi
> > > >
> > > > This issue was observed on the latest linux-block/for-next during CKI
> > > > test, where enabling the poll queues leads to a tight busy polling
> > > > loop and 100% CPU usage during "nvme connect".
> > > > It seems was introduced from v6.19-rc1 and cannot reproduced on v6.18.
> > > > I will try to bisect it.
> > >
> > > It may be related with f22ecf9c14c1 ("blk-mq: delete task running check
> > > in blk_hctx_poll()").
> >
> > This commit merged to v6.19-rc1, I will revert it and retest.
> >
>
> Hi Ming/Christoph
>
> Confirmed the issue was introduced by this commit.
>
> f22ecf9c14c1 blk-mq: delete task running check in blk_hctx_poll()
I guess the need_resched() in blk_hctx_poll() is still not enough, and it
may burns CPU too aggressively, especially for this in-kernel special sync
polling from passthrough request in nvmf_connect_io_queue().
Please try the following patch:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a29d8ac9d3e3..968699277c3d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1480,7 +1480,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
{
do {
- blk_hctx_poll(rq->q, rq->mq_hctx, NULL, 0);
+ blk_hctx_poll(rq->q, rq->mq_hctx, NULL, BLK_POLL_ONESHOT);
cond_resched();
} while (!completion_done(wait));
}
--
2.47.1
Thanks,
Ming
More information about the Linux-nvme
mailing list