[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Tue Apr 22 04:43:28 PDT 2025

On 21/04/2025 22:11, Kamaljit Singh wrote:
> Sagi,
>
>> However the log prints are WARN messages that does not exist in any of
>> the patches.
> Just found the issue with my patch integration. I had picked up Hannes' patch in email
> titled "[PATCH 2/3] nvme-tcp:  sanitize request list handling" and dated Fri Match 7th.
> This had the 3 WARN_ON msgs.
>
> I just found another patch-set with almost the same title dated March 27th.
> I'll integrate this one "PATCH 2/5] nvme-tcp: sanitize request list handling".
> This replaced those WARN_ON msgs w/ dev_err() after if-checks for the same
> conditions.
>
>>> The patches I applied are these:
>>> 1: nvme-tcp: partial write tcp-q-wake-sender untested by Sagi
>>> 2: nvme-tcp: recv-side EAGAIN untested fix by Sagi
>>> 3: nvme-tcp: fix I/O stalls on congested sockets
>> The patch I suggested was a replacement for patch (3). Didn't you
>> get conflicts?
> No. But I reviewed it again and have undone this part of patch-3 in my setup.
> -               else if (unlikely(result < 0))
> +               else if (unlikely(result < 0) && result != -EAGAIN)
>
>
>>> 4: nvme-tcp: sanitize request list handling
>>> 5: nvme-tcp: avoid inline sending when handling R2T PDUs
>>> [4]         nvme-tcp: sanitize request list handling
>>>
>>>       Validate the request in nvme_tcp_handle_r2t() to ensure it's not
>>>       part of any list, otherwise a malicious R2T PDU might inject a
>>>       loop in request list processing.
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 40dce5dc3ff5..4647d5d547f8 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -455,6 +455,7 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
>>>           }
>>>
>>>           list_del(&req->entry);
>>> +       init_llist_node(&req->lentry);
>>>           return req;
>>>    }
>>>
>>> @@ -562,6 +563,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set,
>>>           req->queue = queue;
>>>           nvme_req(rq)->ctrl = &ctrl->ctrl;
>>>           nvme_req(rq)->cmd = &pdu->cmd;
>>> +       init_llist_node(&req->lentry);
>>> +       INIT_LIST_HEAD(&req->entry);
>>>
>>>           return 0;
>>>    }
>>> @@ -773,8 +776,12 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
>>>
>>>           nvme_tcp_setup_h2c_data_pdu(req);
>>>
>>> +       WARN_ON(queue->request == req);
>>> +       WARN_ON(llist_on_list(&req->lentry));
>>> +       WARN_ON(!list_empty(&req->entry));
>> These are not included in Hannes patch afaict.
> They were, as I noted on above.
>
>
>> I'm assuming that these are the stack traces that are firing...
>>
>> Can you please share the output of gdb:
>> l *nvme_tcp_recv_skb+0x115e
> It points to the 3rd WARN_ON msg. I've replaced those warnings with the patch you suggested below.
>
> (gdb) l *(nvme_tcp_recv_skb+0x115e)
> 0x628e is in nvme_tcp_recv_skb (tcp.c:782).
> 777
> 778             nvme_tcp_setup_h2c_data_pdu(req);
> 779
> 780             WARN_ON(queue->request == req);
> 781             WARN_ON(llist_on_list(&req->lentry));
> 782             WARN_ON(!list_empty(&req->entry));
> 783             llist_add(&req->lentry, &queue->req_list);
> 784             if (list_empty(&queue->send_list))
> 785                     queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);

That is strange - should not happen afaict. You did not change the code 
and recompile
before checking this correct?

>> My assumption is that in your case, queue->request is still referenced
>> with the current request when the r2t for it arrives, which is very much
>> possible...
>>
>> So this means that the patch "nvme-tcp: sanitize request list handling"
>> is wrong, queue->request may still be referenced when its r2t arrives.
>>
>> I think this patch needs the following:
>> --
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index f2c829f1aff6..c2d4893ecb94 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -767,8 +767,7 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue
>> *queue,
>>                  return -EPROTO;
>>          }
>>
>> -       if (queue->request == req ||
>> -           llist_on_list(&req->lentry) ||
>> +       if (llist_on_list(&req->lentry) ||
>>              !list_empty(&req->entry)) {
>>                  dev_err(queue->ctrl->ctrl.device,
>>                          "req %d unexpected r2t while processing request\n",
>> --
>>
>> Or in your case - remove the above:
>> --
>>         WARN_ON(queue->request == req);
>> --
> Made that change. Tested and seeing this. -71 is -EPROTO that patch-2 returns .
> Trying to narrow it down further. If you have any suggestion, let me know.
>
> [Mon Apr 21 11:18:58 2025] nvme nvme0: req 7 unexpected r2t while processing request
> [Mon Apr 21 11:18:58 2025] nvme nvme0: receive failed:  -71
> [Mon Apr 21 11:18:58 2025] nvme nvme0: starting error recovery
> [Mon Apr 21 11:18:58 2025] nvme nvme0: req 11 unexpected r2t while processing request
> [Mon Apr 21 11:18:58 2025] block nvme0n1: no usable path - requeuing I/O
> [Mon Apr 21 11:18:58 2025] nvme nvme0: receive failed:  -71
>
>
>>> [2025-04-16 22:56:24.758] [Wed Apr 16 22:56:26 2025] WARNING: CPU: 22 PID: 134878 at tcp.c:782 nvme_tcp_recv_skb+0x115e/0x11c0 [nvme_tcp]
>> My assumption is that this is a result of
>> WARN_ON(queue->request == req);
>  From the gdb decode it points to WARN_ON(!list_empty(&req->entry))

Can you please apply this on top:
--

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 9453cf453d03..334154a58bdb 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -453,7 +453,7 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
                         return NULL;
         }

-       list_del(&req->entry);
+       list_del_init(&req->entry);
         init_llist_node(&req->lentry);
         return req;
  }
--