[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Sun May 11 02:12:11 PDT 2025

On 09/05/2025 9:52, Hannes Reinecke wrote:
> On 5/8/25 23:23, Kamaljit Singh wrote:
>> Hi Hannes, Sagi,
>>
>> On 5/7/25 15:26, Kamaljit Singh wrote:
>>>> Can you please retest with the patchset '[PATCHv5 0/2] nvme-tcp: fixup
>>>> I/O stall on congested sockets' _only_ ?
>>>> (on top of nvme-6.16 latest, of course).
>>>> I think I _should_ have included all the suggestions floating here,
>>>> but we need to have confirmation.
>>>
>>> I've built the kernel against the latest of nvme-6.16 branch along with
>>> these patches. Its with test now. Will let you know when we get some
>>> results back.
>>>
>>> 0. [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets - 
>>> Apr 29, 01:18
>>>      Hannes Reinecke (2):
>>>        nvme-tcp: sanitize request list handling
>>>        nvme-tcp: fix I/O stalls on congested sockets
>>>
>>> 1. [PATCH 1/2] nvme-tcp: sanitize request list handling - Apr 29, 02:31
>>> 2. [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets - Apr 
>>> 29, 01:18
>> IO timeouts are still occurring with Writes. The only Read that timed
>> out was most likely due to the path error. It takes ~4.5 hours to fail.
>>
>> However, this test does not fail if either ECN is off or if digests
>> are not enabled. These passing combinations were run for 16+ hours
>> without any issues. Both ECN and Header+Data Digests need to be turned
>> on for it to fail.
>>
>> Do you have a failing test as well? If so, is it quicker to cause the
>> failure? Would you mind sharing any details?
>>
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 2 (f002) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 1 (2001) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 4 (c004) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: starting error recovery
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 15 (000f) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 6 (5006) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 3 (2003) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] block nvme1n3: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 8 (0008) type 4 
>> opcode 0x2 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 14 (400e) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 13 (100d) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 5 (5005) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 7 (0007) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 11 (a00b) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 12 (f00c) type 4 
>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>    [2025-05-07 19:57:13.295] block nvme1n1: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] block nvme1n1: no usable path - 
>> requeuing I/O
>>    [2025-05-07 19:57:13.295] nvme nvme1: Reconnecting in 10 seconds...
>>
>> In the current build I had these patches on top of the "nvme-6.16" 
>> branch:
>>    41b2c90a51bd nvme-tcp: sanitize request list handling
>>    9260acd6c230 nvme-tcp: fix I/O stalls on congested sockets
>>
> Extremely wild guess: Can you try with this patch on top?
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 0e178115dc04..cdb8ea4eb467 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1277,10 +1277,7 @@ static int nvme_tcp_try_send_ddgst(struct 
> nvme_tcp_request *req)
>                 .iov_len = NVME_TCP_DIGEST_LENGTH - req->offset
>         };
>
> -       if (nvme_tcp_queue_more(queue))
> -               msg.msg_flags |= MSG_MORE;
> -       else
> -               msg.msg_flags |= MSG_EOR;
> +       msg.msg_flags |= MSG_EOR;
>
>         ret = kernel_sendmsg(queue->sock, &msg, &iov, 1, iov.iov_len);
>         if (unlikely(ret <= 0))
>
> It _could_ be that we're waiting in sendmsg() due to MSG_MORE, causing
> these I/O timeouts as processing doesn't continue.

This change doesn't make sense to me...