nvme tcp receive errors

Tue May 11 04:00:51 BST 2021

On Mon, May 10, 2021 at 02:07:47PM -0700, Sagi Grimberg wrote:
> I may have a theory to this issue. I think that the problem is in
> cases where we send commands with data to the controller and then in
> nvme_tcp_send_data between the last successful kernel_sendpage
> and before nvme_tcp_advance_req, the controller sends back a successful
> completion.
>
> If that is the case, then the completion path could be triggered,
> the tag would be reused, triggering a new .queue_rq, setting again
> the req.iter with the new bio params (all is not taken by the
> send_mutex) and then the send context would call nvme_tcp_advance_req
> progressing the req.iter with the former sent bytes... And given that
> the req.iter is used for reads/writes, it is possible that it can
> explain both issues.
> 
> While this is not easy to trigger, there is nothing I think that
> can prevent that. The driver used to have a single context that
> would do both send and recv so this could not have happened, but
> now that we added the .queue_rq send context, I guess this can
> indeed confuse the driver.

Awesome, this is exactly the type of sequence I've been trying to
capture, but couldn't quite get there. Now that you've described it,
that flow can certainly explain the observations, including the
corrupted debug trace event I was trying to add.

The sequence looks unlikely to happen, which agrees with the difficulty
in reproducing it. I am betting right now that you got it, but a little
surprised no one else is reporting a similar problem yet.

Your option "1" looks like the best one, IMO. I've requested dropping
all debug and test patches and using just this one on the current nvme
baseline for the next test cycle.