[PATCH v4 3/5] io_uring: count CQEs in io_iopoll_check()

Ming Lei ming.lei at redhat.com
Sat Feb 28 01:45:56 PST 2026


On Fri, Feb 27, 2026 at 03:35:01PM -0700, Caleb Sander Mateos wrote:
> A subsequent commit will allow uring_cmds that don't use iopoll on
> IORING_SETUP_IOPOLL io_urings. As a result, CQEs can be posted without
> setting the iopoll_completed flag for a request in iopoll_list or going
> through task work. For example, a UBLK_U_IO_FETCH_IO_CMDS command could
> call io_uring_mshot_cmd_post_cqe() to directly post a CQE. The
> io_iopoll_check() loop currently only counts completions posted in
> io_do_iopoll() when determining whether the min_events threshold has
> been met. It also exits early if there are any existing CQEs before
> polling, or if any CQEs are posted while running task work. CQEs posted
> via io_uring_mshot_cmd_post_cqe() or other mechanisms won't be counted
> against min_events.
> 
> Explicitly check the available CQEs in each io_iopoll_check() loop
> iteration to account for CQEs posted in any fashion.
> 
> Signed-off-by: Caleb Sander Mateos <csander at purestorage.com>
> ---
>  io_uring/io_uring.c | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 46f39831d27c..5f694052f501 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1184,11 +1184,10 @@ __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
>  		io_move_task_work_from_local(ctx);
>  }
>  
>  static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
>  {
> -	unsigned int nr_events = 0;
>  	unsigned long check_cq;
>  
>  	min_events = min(min_events, ctx->cq_entries);
>  
>  	lockdep_assert_held(&ctx->uring_lock);
> @@ -1205,19 +1204,12 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
>  		 * dropped CQE.
>  		 */
>  		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
>  			return -EBADR;
>  	}
> -	/*
> -	 * Don't enter poll loop if we already have events pending.
> -	 * If we do, we can potentially be spinning for commands that
> -	 * already triggered a CQE (eg in error).
> -	 */
> -	if (io_cqring_events(ctx))
> -		return 0;
>  
> -	do {
> +	while (io_cqring_events(ctx) < min_events) {

It may not handle zero `min_events` correctly, please see AI review result:

https://netdev-ai.bots.linux.dev/ai-review.html?id=6977b6d6-04e4-4990-a96f-b7580fc5acc4

Thanks,
Ming




More information about the Linux-nvme mailing list