[PATCH] nvme-tcp: Fix possible race of io_work and direct send

Potnuri Bharat Teja bharat at chelsio.com
Mon Dec 21 09:13:03 EST 2020


On Monday, December 12/21/20, 2020 at 13:33:39 +0530, Sagi Grimberg wrote:
> We may send a request (with or without its data) from two
> paths:
> 
> 1. From our I/O context nvme_tcp_io_work which
> is triggered from:
> - queue_rq
> - r2t reception
> - socket data_ready and write_space callbacks
> 
> 2. Directly from queue_rq if the send_list is empty (because
> we want to save the context switch associated with scheduling
> our io_work).
> 
> However, given that now we have the send_mutex, we may run into
> a race condition where none of these contexts will send the pending
> payload to the controller. Both io_work send path and queue_rq send
> path opportunistically attempt to acquire the send_mutex however
> queue_rq only attempts to send a single request, and if io_work
> context fails to acquire the send_mutex it will complete without
> rescheduling itself.
> 
> The race can trigger with the following sequence:
> 1. queue_rq sends request (no incapsule data) and blocks
> 2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU to the
>    send_list and schedules io_work
> 3. io_work triggers and cannot acquire the send_mutex - because of (1), ends
>    without self rescheduling
> 4. queue_rq completes the send, and completes
> ==> no context will send the h2cdata - timeout.
> 
> Fix this by having queue_rq sending as much as it can from the send_list
> such that if it still has any left, its because the socket buffer is
> full and the socket write_space callback will trigger, thus guaranteeing
> that a context will be scheduled to send the h2cdata PDU.
> 
> Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
> Reported-by: Potnuri Bharat Teja <bharat at chelsio.com>
> Reported-by: Samuel Jones <sjones at kalrayinc.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Tested-by: Potnuri Bharat Teja <bharat at chelsio.com>

> ---
>  drivers/nvme/host/tcp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..979ee31b8dd1 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
>  	}
>  }
>  
> +static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
> +{
> +	int ret;
> +
> +	/* drain the send queue as much as we can... */
> +	do {
> +		ret = nvme_tcp_try_send(queue);
> +	} while (ret > 0);
> +}
> +
>  static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>  		bool sync, bool last)
>  {
> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>  	if (queue->io_cpu == smp_processor_id() &&
>  	    sync && empty && mutex_trylock(&queue->send_mutex)) {
>  		queue->more_requests = !last;
> -		nvme_tcp_try_send(queue);
> +		nvme_tcp_send_all(queue);
>  		queue->more_requests = false;
>  		mutex_unlock(&queue->send_mutex);
>  	} else if (last) {
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme



More information about the Linux-nvme mailing list