[PATCH v5 3/5] io_uring: count CQEs in io_iopoll_check()

Ming Lei ming.lei at redhat.com
Wed Mar 4 02:32:45 PST 2026


On Mon, Mar 02, 2026 at 10:29:12AM -0700, Caleb Sander Mateos wrote:
> A subsequent commit will allow uring_cmds that don't use iopoll on
> IORING_SETUP_IOPOLL io_urings. As a result, CQEs can be posted without
> setting the iopoll_completed flag for a request in iopoll_list or going
> through task work. For example, a UBLK_U_IO_FETCH_IO_CMDS command could
> call io_uring_mshot_cmd_post_cqe() to directly post a CQE. The
> io_iopoll_check() loop currently only counts completions posted in
> io_do_iopoll() when determining whether the min_events threshold has
> been met. It also exits early if there are any existing CQEs before
> polling, or if any CQEs are posted while running task work. CQEs posted
> via io_uring_mshot_cmd_post_cqe() or other mechanisms won't be counted
> against min_events.
> 
> Explicitly check the available CQEs in each io_iopoll_check() loop
> iteration to account for CQEs posted in any fashion.
> 
> Signed-off-by: Caleb Sander Mateos <csander at purestorage.com>
> ---
>  io_uring/io_uring.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 46f39831d27c..b4625695bb3a 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1184,11 +1184,10 @@ __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
>  		io_move_task_work_from_local(ctx);
>  }
>  
>  static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
>  {
> -	unsigned int nr_events = 0;
>  	unsigned long check_cq;
>  
>  	min_events = min(min_events, ctx->cq_entries);
>  
>  	lockdep_assert_held(&ctx->uring_lock);
> @@ -1227,34 +1226,30 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
>  		 * the poll to the issued list. Otherwise we can spin here
>  		 * forever, while the workqueue is stuck trying to acquire the
>  		 * very same mutex.
>  		 */
>  		if (list_empty(&ctx->iopoll_list) || io_task_work_pending(ctx)) {
> -			u32 tail = ctx->cached_cq_tail;
> -
>  			(void) io_run_local_work_locked(ctx, min_events);
>  
>  			if (task_work_pending(current) || list_empty(&ctx->iopoll_list)) {
>  				mutex_unlock(&ctx->uring_lock);
>  				io_run_task_work();
>  				mutex_lock(&ctx->uring_lock);
>  			}
>  			/* some requests don't go through iopoll_list */
> -			if (tail != ctx->cached_cq_tail || list_empty(&ctx->iopoll_list))
> +			if (list_empty(&ctx->iopoll_list))
>  				break;
>  		}
>  		ret = io_do_iopoll(ctx, !min_events);
>  		if (unlikely(ret < 0))
>  			return ret;
>  
>  		if (task_sigpending(current))
>  			return -EINTR;
>  		if (need_resched())
>  			break;
> -
> -		nr_events += ret;
> -	} while (nr_events < min_events);
> +	} while (io_cqring_events(ctx) < min_events);

Before entering the loop, if io_cqring_events() finds any queued CQE,
io_iopoll_check() returns immediately without polling.

If the queued CQE is originated from non-iopoll uring_cmd, iopoll request
will not be polled, may this be one issue?


Thanks,
Ming




More information about the Linux-nvme mailing list