nvme tcp receive errors

Fri Apr 9 22:38:38 BST 2021

>>>>> This was observed on the recent 5.12-rc4, so it has all the latest tcp
>>>>> fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
>>>>> difference. It is currently reproducible, though it can take over an
>>>>> hour right now.
>>>>
>>>> After reverting 0dc9edaf80ea, we are observing a kernel panic (below).
>>>
>>> Ah, that's probably because WRITE_ZEROS are not set with RQF_SPECIAL..
>>> This patch is actually needed.
>>>
>>>
>>>> We'll try adding it back, plust adding your debug patch.
>>>
>>> Yes, that would give us more info about what is the state the
>>> request is in when getting these errors
>>
>> We have recreated with your debug patch:
>>
>>    nvme nvme4: queue 6 no space in request 0x1 no space cmd_state 3
>>
>> State 3 corresponds to the "NVME_TCP_CMD_DATA_DONE".
>>
>> The summary from the test that I received:
>>
>>    We have an Ethernet trace for this failure. I filtered the trace for the
>>    connection that maps to "queue 6 of nvme4" and tracked the state of the IO
>>    command with Command ID 0x1 ("Tag 0x1"). The sequence for this command per
>>    the Ethernet trace is:
>>
>>     1. The target receives this Command in an Ethernet frame that has  9 Command
>>        capsules and a partial H2CDATA PDU. The Command with ID 0x1 is a Read
>>        operation for 16K IO size
>>     2. The target sends 11 frames of C2HDATA PDU's each with 1416 bytes and one
>>        C2HDATA PDU with 832 bytes to complete the 16K transfer. LAS flag is set
>>        in the last PDU.
>>     3. The target sends a Response for this Command.
>>     4. About 1.3 ms later, the Host logs this msg and closes the connection.
>>
>> Please let us know if you need any additional information.
> 
> I'm not sure if this is just a different symptom of the same problem,
> but with the debug patch, we're occasionally hitting messages like:
> 
>    nvme nvme5: req 8 r2t len 16384 exceeded data len 16384 (8192 sent) cmd_state 2

According to this message, this means the host got an r2t for 16384
bytes after it already sent 8192 (which can only happen if it previously
got an r2t soliciting 8192 bytes or more that accumulate to that). Can
you share for each r2t pdus in this sequence:
r2t_length
r2t_offset