Request timeout seen with NVMEoF TCP

Sagi Grimberg sagi at grimberg.me
Mon Dec 14 20:53:44 EST 2020


> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.

I think we still have a race here with the following:
1. queue_rq sends h2cdata PDU (no data)
2. host receives r2t - prepares data PDU to send and schedules io_work
3. queue_rq sends another h2cdata PDU - ends up sending (2) because it 
was queued before it
4. io_work starts, loops but never able to acquire the send_mutex - 
eventually just ends (dosn't requeue)
5. (3) completes, now nothing will send (2)

We can either schedule the io_work from the direct send path, but that
is less efficient than just trying to drain the send queue in the
direct send path and if not all was sent, the write_space callback
will trigger it.

Potnuri, does this patch solves what you are seeing?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..1b4e25624ba4 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct 
nvme_tcp_request *req,
         }
  }

+static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
+{
+       int ret;
+
+       /* drain the send queue as much as we can... */
+       do {
+               ret = nvme_tcp_try_send(queue);
+       } while (ret > 0);
+}
+
  static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
                 bool sync, bool last)
  {
@@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct 
nvme_tcp_request *req,
         if (queue->io_cpu == smp_processor_id() &&
             sync && empty && mutex_trylock(&queue->send_mutex)) {
                 queue->more_requests = !last;
-               nvme_tcp_try_send(queue);
+               nvme_tcp_send_all(queue);
                 queue->more_requests = false;
                 mutex_unlock(&queue->send_mutex);
         } else if (last) {
@@ -1122,6 +1132,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
                                 pending = true;
                         else if (unlikely(result < 0))
                                 break;
+               } else {
+                       /*
+                        * submission path is sending, we need to
+                        * continue or resched because the submission
+                        * path direct send is not concerned with
+                        * rescheduling...
+                        */
+                       pending = true;
                 }

                 result = nvme_tcp_try_recv(queue);
--



More information about the Linux-nvme mailing list