nvme tcp receive errors

Mon May 17 21:48:55 BST 2021

On Thu, May 13, 2021 at 12:53:54PM -0700, Sagi Grimberg wrote:
> On 5/13/21 8:48 AM, Keith Busch wrote:
> > On Tue, May 11, 2021 at 10:17:09AM -0700, Sagi Grimberg wrote:
> > > 
> > > > > I may have a theory to this issue. I think that the problem is in
> > > > > cases where we send commands with data to the controller and then in
> > > > > nvme_tcp_send_data between the last successful kernel_sendpage
> > > > > and before nvme_tcp_advance_req, the controller sends back a successful
> > > > > completion.
> > > > > 
> > > > > If that is the case, then the completion path could be triggered,
> > > > > the tag would be reused, triggering a new .queue_rq, setting again
> > > > > the req.iter with the new bio params (all is not taken by the
> > > > > send_mutex) and then the send context would call nvme_tcp_advance_req
> > > > > progressing the req.iter with the former sent bytes... And given that
> > > > > the req.iter is used for reads/writes, it is possible that it can
> > > > > explain both issues.
> > > > > 
> > > > > While this is not easy to trigger, there is nothing I think that
> > > > > can prevent that. The driver used to have a single context that
> > > > > would do both send and recv so this could not have happened, but
> > > > > now that we added the .queue_rq send context, I guess this can
> > > > > indeed confuse the driver.
> > > > 
> > > > Awesome, this is exactly the type of sequence I've been trying to
> > > > capture, but couldn't quite get there. Now that you've described it,
> > > > that flow can certainly explain the observations, including the
> > > > corrupted debug trace event I was trying to add.
> > > > 
> > > > The sequence looks unlikely to happen, which agrees with the difficulty
> > > > in reproducing it. I am betting right now that you got it, but a little
> > > > surprised no one else is reporting a similar problem yet.
> > > 
> > > We had at least one report from Potnuri that I think may have been
> > > triggered by this, this ended up fixed (or rather worked-around
> > > with 5c11f7d9f843).
> > > 
> > > > Your option "1" looks like the best one, IMO. I've requested dropping
> > > > all debug and test patches and using just this one on the current nvme
> > > > baseline for the next test cycle.
> > > 
> > > Cool, waiting to hear back...
> > 
> > This patch has been tested successfully on the initial workloads. There
> > are several more that need to be validated, but each one runs for many
> > hours, so it may be a couple more days before completed. Just wanted to
> > leat you know: so far, so good.
> 
> Encouraging... I'll send a patch for that as soon as you give me the
> final verdict. I'm assuming Narayan would be the reporter and the
> tester?

This tests successfully. There was one timeout issue observed in all the
testing, but does not appear related to the reported problems here, or
your fix, so I will start a new thread on that if I can get more
information on it.

You may use the following tags for the commit log:

Reported-by: Narayan Ayalasomayajula <narayan.ayalasomayajula at wdc.com>
Tested-by: Anil Mishra <anil.mishra at wdc.com>
Reviewed-by: Keith Busch <kbusch at kernel.org>