nvme tcp receive errors
Keith Busch
kbusch at kernel.org
Mon Apr 5 15:37:02 BST 2021
On Fri, Apr 02, 2021 at 10:27:11AM -0700, Sagi Grimberg wrote:
>
> > > Thanks for the reply.
> > >
> > > This was observed on the recent 5.12-rc4, so it has all the latest tcp
> > > fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
> > > difference. It is currently reproducible, though it can take over an
> > > hour right now.
> >
> > After reverting 0dc9edaf80ea, we are observing a kernel panic (below).
>
> Ah, that's probably because WRITE_ZEROS are not set with RQF_SPECIAL..
> This patch is actually needed.
>
>
> > We'll try adding it back, plust adding your debug patch.
>
> Yes, that would give us more info about what is the state the
> request is in when getting these errors
We have recreated with your debug patch:
nvme nvme4: queue 6 no space in request 0x1 no space cmd_state 3
State 3 corresponds to the "NVME_TCP_CMD_DATA_DONE".
The summary from the test that I received:
We have an Ethernet trace for this failure. I filtered the trace for the
connection that maps to "queue 6 of nvme4" and tracked the state of the IO
command with Command ID 0x1 ("Tag 0x1"). The sequence for this command per
the Ethernet trace is:
1. The target receives this Command in an Ethernet frame that has 9 Command
capsules and a partial H2CDATA PDU. The Command with ID 0x1 is a Read
operation for 16K IO size
2. The target sends 11 frames of C2HDATA PDU's each with 1416 bytes and one
C2HDATA PDU with 832 bytes to complete the 16K transfer. LAS flag is set
in the last PDU.
3. The target sends a Response for this Command.
4. About 1.3 ms later, the Host logs this msg and closes the connection.
Please let us know if you need any additional information.
More information about the Linux-nvme
mailing list