nvme tcp receive errors

Keith Busch kbusch at kernel.org
Wed Apr 14 01:29:46 BST 2021


On Fri, Apr 09, 2021 at 11:04:43AM -0700, Sagi Grimberg wrote:
> 
> > > > > Thanks for the reply.
> > > > > 
> > > > > This was observed on the recent 5.12-rc4, so it has all the latest tcp
> > > > > fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
> > > > > difference. It is currently reproducible, though it can take over an
> > > > > hour right now.
> > > > 
> > > > After reverting 0dc9edaf80ea, we are observing a kernel panic (below).
> > > 
> > > Ah, that's probably because WRITE_ZEROS are not set with RQF_SPECIAL..
> > > This patch is actually needed.
> > > 
> > > 
> > > > We'll try adding it back, plust adding your debug patch.
> > > 
> > > Yes, that would give us more info about what is the state the
> > > request is in when getting these errors
> > 
> > We have recreated with your debug patch:
> > 
> >    nvme nvme4: queue 6 no space in request 0x1 no space cmd_state 3
> > 
> > State 3 corresponds to the "NVME_TCP_CMD_DATA_DONE".
> > 
> > The summary from the test that I received:
> > 
> >    We have an Ethernet trace for this failure. I filtered the trace for the
> >    connection that maps to "queue 6 of nvme4" and tracked the state of the IO
> >    command with Command ID 0x1 ("Tag 0x1"). The sequence for this command per
> >    the Ethernet trace is:
> > 
> >     1. The target receives this Command in an Ethernet frame that has  9 Command
> >        capsules and a partial H2CDATA PDU. The Command with ID 0x1 is a Read
> >        operation for 16K IO size
> >     2. The target sends 11 frames of C2HDATA PDU's each with 1416 bytes and one
> >        C2HDATA PDU with 832 bytes to complete the 16K transfer. LAS flag is set
> >        in the last PDU.
> 
> Are the c2hdata pdus have data_length of 1416? and the last has data_length
> = 832?
> 
> 1416 * 11 + 832 = 16408 > 16384

Sorry, this was a mistake in the reporting. The last one's data length
was only 808; 832 was the packet length.

> Can you share for each of the c2hdata PDUs what is:
> - hlen

24 for all of them

> - plen

11 transfers at 1440, 832 for the last one

> - data_length

11 transfers at 1416, 808 for the last one

> - data_offset

0, 1416, 2832, 4248, 5564, 7080, 8496, 9912, 11328, 12744, 14160, 15567



More information about the Linux-nvme mailing list