[PATCH 3/3] nvme-tcp: do not queue io_work on a full socket
Sagi Grimberg
sagi at grimberg.me
Thu May 29 05:38:26 PDT 2025
On 28/05/2025 9:45, Hannes Reinecke wrote:
> There really is no point in scheduling io_work from ->queue_rq()
> if the socket is full; we'll be notified by the ->write_space()
> callback once space on the socket becomes available.
> Consequently we also need to remove the check for 'sk_stream_is_writeable()'
> in the ->write_space() callback as we need to schedule io_work
> to receive packets even if the sending side is blocked.
>
> Signed-off-by: Hannes Reinecke <hare at kernel.org>
> ---
> drivers/nvme/host/tcp.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 0e178115dc04..e4dd1620dc28 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -411,6 +411,9 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
> empty = llist_add(&req->lentry, &queue->req_list) &&
> list_empty(&queue->send_list) && !queue->request;
>
> + if (!sk_stream_is_writeable(queue->sock->sk))
> + empty = false;
This is wrong. the fact that the socket it not writeable does not mean
that the queue
is not empty (meaning we have rqeuests that we have yet to write to the
socket).
> +
> /*
> * if we're the first on the send_list and we can try to send
> * directly, otherwise queue io_work. Also, only do that if we
> @@ -422,7 +425,8 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
> mutex_unlock(&queue->send_mutex);
> }
>
> - if (last && nvme_tcp_queue_has_pending(queue))
> + if (last && nvme_tcp_queue_has_pending(queue) &&
> + sk_stream_is_writeable(queue->sock->sk))
I think its fine, because either this or the callback will effectively
not queue the work
as the work can only be queued once.
> queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
> }
>
> @@ -1074,7 +1078,7 @@ static void nvme_tcp_write_space(struct sock *sk)
>
> read_lock_bh(&sk->sk_callback_lock);
> queue = sk->sk_user_data;
> - if (likely(queue && sk_stream_is_writeable(sk))) {
> + if (likely(queue)) {
> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
> }
I'd keep only this change. Unless you are convinced that the other
stream checks
are correct?
More information about the Linux-nvme
mailing list