nvme tcp receive errors

Sagi Grimberg sagi at grimberg.me
Tue May 11 18:17:09 BST 2021


>> I may have a theory to this issue. I think that the problem is in
>> cases where we send commands with data to the controller and then in
>> nvme_tcp_send_data between the last successful kernel_sendpage
>> and before nvme_tcp_advance_req, the controller sends back a successful
>> completion.
>>
>> If that is the case, then the completion path could be triggered,
>> the tag would be reused, triggering a new .queue_rq, setting again
>> the req.iter with the new bio params (all is not taken by the
>> send_mutex) and then the send context would call nvme_tcp_advance_req
>> progressing the req.iter with the former sent bytes... And given that
>> the req.iter is used for reads/writes, it is possible that it can
>> explain both issues.
>>
>> While this is not easy to trigger, there is nothing I think that
>> can prevent that. The driver used to have a single context that
>> would do both send and recv so this could not have happened, but
>> now that we added the .queue_rq send context, I guess this can
>> indeed confuse the driver.
> 
> Awesome, this is exactly the type of sequence I've been trying to
> capture, but couldn't quite get there. Now that you've described it,
> that flow can certainly explain the observations, including the
> corrupted debug trace event I was trying to add.
> 
> The sequence looks unlikely to happen, which agrees with the difficulty
> in reproducing it. I am betting right now that you got it, but a little
> surprised no one else is reporting a similar problem yet.

We had at least one report from Potnuri that I think may have been
triggered by this, this ended up fixed (or rather worked-around
with 5c11f7d9f843).

> Your option "1" looks like the best one, IMO. I've requested dropping
> all debug and test patches and using just this one on the current nvme
> baseline for the next test cycle.

Cool, waiting to hear back...



More information about the Linux-nvme mailing list