[PATCH 1/2] nvme-rdma: Fix race between queue timeout and error recovery

Sun Apr 8 07:07:49 PDT 2018

On 4/8/2018 2:04 PM, Sagi Grimberg wrote:
>
>> Please send an introduction cover letter explaining what issue you've
>> triggered and your overall design.
>
> The commit log is actually wrong... We don't complete the request in two
> places, the issue is that we need to make sure to unmap user buffer

We have two bugs here that those paths fix.
What you said is one of them.
The second one is what I said here.
I will show the call traces I have got in my V2.

> before completing the request in case of a timeout. I sent this patch
> to a bug report on the list and this is what it is designed to do.
>
> Given that we already simply schedule error recovery, we will fail it
> there, after we drain the queue pair, so the choice is to reset the
> timer for it in the timeout callout.
>
> We could alternatively invalidate the rkey in the timeout callout, but
> that won't work with the unsafe rkey mode.