[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Mon Apr 21 12:11:48 PDT 2025

Sagi,

>However the log prints are WARN messages that does not exist in any of
>the patches.
Just found the issue with my patch integration. I had picked up Hannes' patch in email 
titled "[PATCH 2/3] nvme-tcp:  sanitize request list handling" and dated Fri Match 7th. 
This had the 3 WARN_ON msgs.

I just found another patch-set with almost the same title dated March 27th. 
I'll integrate this one "PATCH 2/5] nvme-tcp: sanitize request list handling".
This replaced those WARN_ON msgs w/ dev_err() after if-checks for the same 
conditions.

>> The patches I applied are these:
>> 1: nvme-tcp: partial write tcp-q-wake-sender untested by Sagi
>> 2: nvme-tcp: recv-side EAGAIN untested fix by Sagi
>> 3: nvme-tcp: fix I/O stalls on congested sockets
>
>The patch I suggested was a replacement for patch (3). Didn't you
>get conflicts?
No. But I reviewed it again and have undone this part of patch-3 in my setup.
-               else if (unlikely(result < 0))
+               else if (unlikely(result < 0) && result != -EAGAIN)

>> 4: nvme-tcp: sanitize request list handling
>> 5: nvme-tcp: avoid inline sending when handling R2T PDUs

>
>> [4]         nvme-tcp: sanitize request list handling
>>
>>      Validate the request in nvme_tcp_handle_r2t() to ensure it's not
>>      part of any list, otherwise a malicious R2T PDU might inject a
>>      loop in request list processing.
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 40dce5dc3ff5..4647d5d547f8 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -455,6 +455,7 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
>>          }
>>
>>          list_del(&req->entry);
>> +       init_llist_node(&req->lentry);
>>          return req;
>>   }
>>
>> @@ -562,6 +563,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set,
>>          req->queue = queue;
>>          nvme_req(rq)->ctrl = &ctrl->ctrl;
>>          nvme_req(rq)->cmd = &pdu->cmd;
>> +       init_llist_node(&req->lentry);
>> +       INIT_LIST_HEAD(&req->entry);
>>
>>          return 0;
>>   }
>> @@ -773,8 +776,12 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
>>
>>          nvme_tcp_setup_h2c_data_pdu(req);
>>
>> +       WARN_ON(queue->request == req);
>> +       WARN_ON(llist_on_list(&req->lentry));
>> +       WARN_ON(!list_empty(&req->entry));
>
>These are not included in Hannes patch afaict.
They were, as I noted on above.

>I'm assuming that these are the stack traces that are firing...
>
>Can you please share the output of gdb:
>l *nvme_tcp_recv_skb+0x115e
It points to the 3rd WARN_ON msg. I've replaced those warnings with the patch you suggested below.

(gdb) l *(nvme_tcp_recv_skb+0x115e)
0x628e is in nvme_tcp_recv_skb (tcp.c:782).
777
778             nvme_tcp_setup_h2c_data_pdu(req);
779
780             WARN_ON(queue->request == req);
781             WARN_ON(llist_on_list(&req->lentry));
782             WARN_ON(!list_empty(&req->entry));
783             llist_add(&req->lentry, &queue->req_list);
784             if (list_empty(&queue->send_list))
785                     queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);

>
>My assumption is that in your case, queue->request is still referenced
>with the current request when the r2t for it arrives, which is very much
>possible...
>
>So this means that the patch "nvme-tcp: sanitize request list handling"
>is wrong, queue->request may still be referenced when its r2t arrives.
>
>I think this patch needs the following:
>--
>diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>index f2c829f1aff6..c2d4893ecb94 100644
>--- a/drivers/nvme/host/tcp.c
>+++ b/drivers/nvme/host/tcp.c
>@@ -767,8 +767,7 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue
>*queue,
>                 return -EPROTO;
>         }
>
>-       if (queue->request == req ||
>-           llist_on_list(&req->lentry) ||
>+       if (llist_on_list(&req->lentry) ||
>             !list_empty(&req->entry)) {
>                 dev_err(queue->ctrl->ctrl.device,
>                         "req %d unexpected r2t while processing request\n",
>--
>
>Or in your case - remove the above:
>--
>        WARN_ON(queue->request == req);
>--
Made that change. Tested and seeing this. -71 is -EPROTO that patch-2 returns . 
Trying to narrow it down further. If you have any suggestion, let me know.

[Mon Apr 21 11:18:58 2025] nvme nvme0: req 7 unexpected r2t while processing request
[Mon Apr 21 11:18:58 2025] nvme nvme0: receive failed:  -71
[Mon Apr 21 11:18:58 2025] nvme nvme0: starting error recovery
[Mon Apr 21 11:18:58 2025] nvme nvme0: req 11 unexpected r2t while processing request
[Mon Apr 21 11:18:58 2025] block nvme0n1: no usable path - requeuing I/O
[Mon Apr 21 11:18:58 2025] nvme nvme0: receive failed:  -71

>> [2025-04-16 22:56:24.758] [Wed Apr 16 22:56:26 2025] WARNING: CPU: 22 PID: 134878 at tcp.c:782 nvme_tcp_recv_skb+0x115e/0x11c0 [nvme_tcp]
>
>My assumption is that this is a result of
>WARN_ON(queue->request == req);
>From the gdb decode it points to WARN_ON(!list_empty(&req->entry))