[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Sat May 17 03:01:32 PDT 2025

On 14/05/2025 9:35, Hannes Reinecke wrote:
> On 5/13/25 21:24, Kamaljit Singh wrote:
>> Hi Sagi, Hannes,
>>
>> On 09/11/2025 02:11, Sagi Grimberg wrote:
>>>> IO timeouts are still occurring with Writes. The only Read that timed
>>>> out was most likely due to the path error. It takes ~4.5 hours to 
>>>> fail.
>>>>
>>>> However, this test does not fail if either ECN is off or if digests
>>>> are not enabled. These passing combinations were run for 16+ hours
>>>> without any issues. Both ECN and Header+Data Digests need to be turned
>>>> on for it to fail.
>>>>
>>>> Do you have a failing test as well? If so, is it quicker to cause the
>>>> failure? Would you mind sharing any details?
>>>>
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 2 (f002) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 1 (2001) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 4 (c004) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: starting error recovery
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 15 (000f) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 6 (5006) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 3 (2003) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] block nvme1n3: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 8 (0008) type 4 
>>>> opcode 0x2 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 14 (400e) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 13 (100d) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n4: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n2: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 5 (5005) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 7 (0007) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 11 (a00b) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 12 (f00c) type 4 
>>>> opcode 0x1 (I/O Cmd) QID 4 timeout
>>>>      [2025-05-07 19:57:13.295] block nvme1n1: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] block nvme1n1: no usable path - 
>>>> requeuing I/O
>>>>      [2025-05-07 19:57:13.295] nvme nvme1: Reconnecting in 10 
>>>> seconds...
>>>>
>>>> In the current build I had these patches on top of the "nvme-6.16" 
>>>> branch:
>>>>      41b2c90a51bd nvme-tcp: sanitize request list handling
>>>>      9260acd6c230 nvme-tcp: fix I/O stalls on congested sockets
>>>
>>> Kamaljit, with the prior version of the patchset (the proposal with the
>>> wake_sender flag) did this not reproduce regardless of ECN?
>>   With the last patchset, when ECN=off, we did not see any IO timeouts
>> even with a weekend long test. This was true for both cases, i.e. with
>> Inband Auth and with SecureConcat.
>>
>> With ECN=on & HD+DD=on IO timeout still persists for both Inband Auth &
>> SC. I’m currently debugging a possible target side issue with ECN. I’ll
>> let you know once I have some resolution.
>>
>> I don't have any clear indications of the original kernel issue to be 
>> able to
>> differentiate against the current target-side issue. So, if you want 
>> to go
>> ahead and merge those two patchsets that may be fine for now.
>>
> Thanks a lot for your confirmation.
> We continue to have issues with high load or oversubscribed fabrics.
> But this patchset is addressing the problem of I/O timeouts during
> _connect_, which I would argue is a different story.

We still need to hunt these down. I'm still puzzled why adding the 
WAKE_SENDER
flag was able to make this issue disappear? I'll have another look at 
this patch.

For now, I think we can go with this patchset, and then incrementally 
fix the remains.