nvme tcp receive errors

Keith Busch kbusch at kernel.org
Tue May 4 15:36:33 BST 2021


On Mon, May 03, 2021 at 01:00:05PM -0700, Sagi Grimberg wrote:
> > > > > > Hey Keith,
> > > > > > 
> > > > > > Did this resolve the issues?
> > > > > 
> > > > > We're unfortunately still observing data digest issues even with this.
> > > > > Most of the testing has shifted to the r2t error, so I don't have any
> > > > > additional details on the data digest problem.
> > > > 
> > > > I've looked again at the code, and I'm not convinced that the patch
> > > > is needed at all anymore, I'm now surprised that it actually changed
> > > > anything (disregarding data digest).
> > > > 
> > > > The driver does not track the received bytes by definition, it relies
> > > > on the controller to send it a completion, or set the success flag in
> > > > the _last_ c2hdata pdu. Does your target set
> > > > NVME_TCP_F_DATA_SUCCESS on any of the c2hdata pdus?
> > > 
> > > Perhaps you can also run this patch instead?
> > 
> > Thanks, will give this a shot.
> 
> Still would be beneficial to look at the traces and check if
> the success flag happens to be set. If this flag is set, the
> driver _will_ complete the request without checking the bytes
> received thus far (similar to how pci and rdma don't and can't
> check dma byte count).

I realized this patch is the same as one you'd sent earlier. We hit the
BUG_ON(), and then proceeded to use your follow-up patch, which appeared
to fix the data receive problem, but introduced data digest problems.

So, are you saying that hitting this BUG_ON means that the driver has
observed the completion out-of-order from the expected data?



More information about the Linux-nvme mailing list