[PATCH 3/3] nvme-tcp: fix I/O stalls on congested sockets

Kamaljit Singh Kamaljit.Singh1 at wdc.com
Thu May 8 14:23:56 PDT 2025


Hi Hannes, Sagi,

On 5/7/25 15:26, Kamaljit Singh wrote:
>>Can you please retest with the patchset '[PATCHv5 0/2] nvme-tcp: fixup
>>I/O stall on congested sockets' _only_ ?
>>(on top of nvme-6.16 latest, of course).
>>I think I _should_ have included all the suggestions floating here,
>>but we need to have confirmation.
>
>I've built the kernel against the latest of nvme-6.16 branch along with
>these patches. Its with test now. Will let you know when we get some
>results back.
>
>0. [PATCHv5 0/2] nvme-tcp: fixup I/O stall on congested sockets - Apr 29, 01:18
>    Hannes Reinecke (2):
>      nvme-tcp: sanitize request list handling
>      nvme-tcp: fix I/O stalls on congested sockets
>
>1. [PATCH 1/2] nvme-tcp: sanitize request list handling - Apr 29, 02:31
>2. [PATCH 2/2] nvme-tcp: fix I/O stalls on congested sockets - Apr 29, 01:18
IO timeouts are still occurring with Writes. The only Read that timed
out was most likely due to the path error. It takes ~4.5 hours to fail.

However, this test does not fail if either ECN is off or if digests
are not enabled. These passing combinations were run for 16+ hours
without any issues. Both ECN and Header+Data Digests need to be turned
on for it to fail.

Do you have a failing test as well? If so, is it quicker to cause the
failure? Would you mind sharing any details?

  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 2 (f002) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 1 (2001) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 4 (c004) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: starting error recovery
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 15 (000f) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 6 (5006) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 3 (2003) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] block nvme1n3: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 8 (0008) type 4 opcode 0x2 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 14 (400e) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 13 (100d) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] block nvme1n4: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n4: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n4: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n2: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n4: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n2: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n2: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 5 (5005) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 7 (0007) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 11 (a00b) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] nvme nvme1: I/O tag 12 (f00c) type 4 opcode 0x1 (I/O Cmd) QID 4 timeout
  [2025-05-07 19:57:13.295] block nvme1n1: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] block nvme1n1: no usable path - requeuing I/O
  [2025-05-07 19:57:13.295] nvme nvme1: Reconnecting in 10 seconds...

In the current build I had these patches on top of the "nvme-6.16" branch:
  41b2c90a51bd nvme-tcp: sanitize request list handling
  9260acd6c230 nvme-tcp: fix I/O stalls on congested sockets

Thanks,
Kamaljit Singh


More information about the Linux-nvme mailing list