nvme-tcp: question on poll return value

Sagi Grimberg sagi at grimberg.me
Sun Jan 28 03:05:10 PST 2024



On 1/24/24 17:31, Daniel Wagner wrote:
> Hi Sagi,
>
> I am stubled across nr_cqe while debugging something else and I am not
> sure if the poll function is doing the right thing with it. IIUC, the
> blk_mq_ops poll callback is supposed to return > 0 value when it
> processed at least one element. If this is the case, than I think we
> would need to first copy the current value in nr_cqe before calling
> nvme_tcp_try_recv. The function will reset queue->nr_cqe to 0, thus
> nvme_tcp_poll will always return 0.
>
> Does this make any sense?

Not sure I understand your logic. nr_cqe is cleared when try_recv starts 
and incremented
in every completion, and nvme_tcp_poll is returning this value as these 
are the number
of cqes collected during the try_recv triggered by nvme_tcp_poll.

It is true that if between try_recv triggered by nvme_tcp_poll and 
another try_recv called
from the RX io_work path, nr_cqe may be cleared again. That is something 
we can address
by passing nr_cqe by reference as an in+out parameter to try_recv. But I 
don't see how
nr_cqe is going to always be 0.

>
> Thanks,
> Daniel
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index d79811cfa0ce..db9a105bd986 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2573,6 +2573,7 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>   {
>   	struct nvme_tcp_queue *queue = hctx->driver_data;
>   	struct sock *sk = queue->sock->sk;
> +	int nr_cqe;
>
>   	if (!test_bit(NVME_TCP_Q_LIVE, &queue->flags))
>   		return 0;
> @@ -2580,9 +2581,10 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>   	set_bit(NVME_TCP_Q_POLLING, &queue->flags);
>   	if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
>   		sk_busy_loop(sk, true);
> +	nr_cqe = queue->nr_cqe;
>   	nvme_tcp_try_recv(queue);
>   	clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
> -	return queue->nr_cqe;
> +	return nr_cqe;
>   }
>
>   static int nvme_tcp_get_address(struct nvme_ctrl *ctrl, char *buf, int size)

This is wrong, in this case, nr_cqe would represent what happened 
_before_ calling it
and not how many completions it consumed.



More information about the Linux-nvme mailing list