nvme-tls and TCP window full

Hannes Reinecke hare at suse.de
Tue Jul 11 03:31:16 PDT 2023


On 7/11/23 11:28, Sagi Grimberg wrote:
> Hey Hannes,
> 
> Any progress on this one?
> 
Oh well; slow going.
After some length debugging I found that the read_sock() implementation 
wasn't exactly up to scratch (it would read each byte individually, 
instead of each skb individually ... D'oh).
But even with that fixed I'm still seeing I/O stalls due to window full:

[85639.669120] nvme nvme0: queue 2: C2H data cid 0x4d offset 0 len 4096
[85639.669885] nvme nvme0: queue 2: PDU Data cid 0x4d offset 29 len 4120 
rem 4096
[85639.670809] nvme nvme0: queue 2: RSP cid 0x4d status 0
[85639.671449] nvme nvme0: queue 2: pdu 24 data 0 consumed 4144 result 0
[85639.674928] nvme nvme0: queue 2: skb 00000000bee9b76d len 8262 offset 
5 len 8240
[85639.676474] nvme nvme0: queue 2: C2H data cid 0x4e offset 0 len 8192
[85639.678013] nvme nvme0: queue 2: PDU Data cid 0x4e offset 29 len 8216 
rem 8192
[85639.679810] nvme nvme0: queue 2: RSP cid 0x4e status 0
[85639.681247] nvme nvme0: queue 2: pdu 24 data 0 consumed 8240 result 0
[85639.690985] nvme nvme0: queue 2: skb 00000000bee9b76d len 16406 
offset 5 len 16384
[85639.692321] nvme nvme0: queue 2: C2H data cid 0x4f offset 0 len 126976
[85639.694012] nvme nvme0: queue 2: PDU Data cid 0x4f offset 29 len 
16360 rem 126976
[85639.695882] nvme nvme0: queue 2: pdu 0 data 110616 consumed 16384 
result 0
[85639.697667] nvme nvme0: queue 2: skb 00000000bee9b76d len 16406 
offset 5 len 16384
[85639.699339] nvme nvme0: queue 2: PDU Data cid 0x4f offset 5 len 16384 
rem 110616
[85639.701122] nvme nvme0: queue 2: pdu 0 data 94232 consumed 16384 result 0
[85639.702831] nvme nvme0: queue 2: skb 00000000bee9b76d len 16406 
offset 5 len 16384
[85639.704918] nvme nvme0: queue 2: PDU Data cid 0x4f offset 5 len 16384 
rem 94232
[85639.706759] nvme nvme0: queue 2: pdu 0 data 77848 consumed 16384 result 0
[85639.708367] nvme nvme0: queue 2: skb 00000000bee9b76d len 16406 
offset 5 len 16384
[85639.711219] nvme nvme0: queue 2: PDU Data cid 0x4f offset 5 len 16384 
rem 77848
[85639.712108] nvme nvme0: queue 2: pdu 0 data 61464 consumed 16384 result 0
[85639.714182] nvme nvme0: queue 2: skb 00000000bee9b76d len 16406 
offset 5 len 16384
[85639.715107] nvme nvme0: queue 2: PDU Data cid 0x4f offset 5 len 16384 
rem 61464
[85639.715989] nvme nvme0: queue 2: pdu 0 data 45080 consumed 16384 result 0
[85671.510572] nvme nvme0: queue 2: timeout cid 0x4f type 4 opcode 0x2 
(Read)

(These are just client-side logs).
Which looks to me as if we're waiting for the server to continue sending
PDU data frames, only these never arrive.
So really I wonder whether it's not rather an issue on the server side. 
Maybe the server doesn't retire skbs (or not all of them), causing the 
TCP window to shrink.
That, of course, is wild guessing, as I have no idea if and how calls to 
'consume_skb' reflect back to the TCP window size.

Investigation continues.

Cheers,

Hannes




More information about the Linux-nvme mailing list