[PATCH] nvme-tcp: Fix possible race of io_work and direct send
Sagi Grimberg
sagi at grimberg.me
Mon Dec 21 03:03:39 EST 2020
We may send a request (with or without its data) from two
paths:
1. From our I/O context nvme_tcp_io_work which
is triggered from:
- queue_rq
- r2t reception
- socket data_ready and write_space callbacks
2. Directly from queue_rq if the send_list is empty (because
we want to save the context switch associated with scheduling
our io_work).
However, given that now we have the send_mutex, we may run into
a race condition where none of these contexts will send the pending
payload to the controller. Both io_work send path and queue_rq send
path opportunistically attempt to acquire the send_mutex however
queue_rq only attempts to send a single request, and if io_work
context fails to acquire the send_mutex it will complete without
rescheduling itself.
The race can trigger with the following sequence:
1. queue_rq sends request (no incapsule data) and blocks
2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU to the
send_list and schedules io_work
3. io_work triggers and cannot acquire the send_mutex - because of (1), ends
without self rescheduling
4. queue_rq completes the send, and completes
==> no context will send the h2cdata - timeout.
Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.
Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat at chelsio.com>
Reported-by: Samuel Jones <sjones at kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
drivers/nvme/host/tcp.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..979ee31b8dd1 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
}
}
+static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
+{
+ int ret;
+
+ /* drain the send queue as much as we can... */
+ do {
+ ret = nvme_tcp_try_send(queue);
+ } while (ret > 0);
+}
+
static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
bool sync, bool last)
{
@@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
if (queue->io_cpu == smp_processor_id() &&
sync && empty && mutex_trylock(&queue->send_mutex)) {
queue->more_requests = !last;
- nvme_tcp_try_send(queue);
+ nvme_tcp_send_all(queue);
queue->more_requests = false;
mutex_unlock(&queue->send_mutex);
} else if (last) {
--
2.25.1
More information about the Linux-nvme
mailing list