[bug report] nvme-tcp poll queue causes busy loop and 100% CPU in nvme_tcp_poll() on the latest linux-block/for-next

Wed Jan 21 19:48:06 PST 2026

On Wed, Jan 21, 2026 at 10:15:56PM +0800, Yi Zhang wrote:
> On Wed, Jan 21, 2026 at 5:03 PM Yi Zhang <yi.zhang at redhat.com> wrote:
> >
> > On Wed, Jan 21, 2026 at 4:53 PM Ming Lei <ming.lei at redhat.com> wrote:
> > >
> > > On Wed, Jan 21, 2026 at 10:50:00AM +0800, Yi Zhang wrote:
> > > > Hi
> > > >
> > > > This issue was observed on the latest linux-block/for-next during CKI
> > > > test, where enabling the poll queues leads to a tight busy polling
> > > > loop and 100% CPU usage during "nvme connect".
> > > > It seems was introduced from v6.19-rc1 and cannot reproduced on v6.18.
> > > > I will try to bisect it.
> > >
> > > It may be related with f22ecf9c14c1 ("blk-mq: delete task running check
> > > in blk_hctx_poll()").
> >
> > This commit merged to v6.19-rc1, I will revert it and retest.
> >
> 
> Hi Ming/Christoph
> 
> Confirmed the issue was introduced by this commit.
> 
> f22ecf9c14c1 blk-mq: delete task running check in blk_hctx_poll()

I guess the need_resched() in blk_hctx_poll() is still not enough, and it
may burns CPU too aggressively, especially for this in-kernel special sync
polling from passthrough request in nvmf_connect_io_queue().

Please try the following patch:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a29d8ac9d3e3..968699277c3d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1480,7 +1480,7 @@ EXPORT_SYMBOL_GPL(blk_rq_is_poll);
 static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
 {
 	do {
-		blk_hctx_poll(rq->q, rq->mq_hctx, NULL, 0);
+		blk_hctx_poll(rq->q, rq->mq_hctx, NULL, BLK_POLL_ONESHOT);
 		cond_resched();
 	} while (!completion_done(wait));
 }
-- 
2.47.1



Thanks,
Ming