[PATCH 1/2 V2] nvme-rdma: Fix race between queue timeout and error recovery

Sun Apr 8 14:02:30 PDT 2018

On 04/08/2018 07:16 PM, Israel Rukshin wrote:
> On 4/8/2018 6:26 PM, Sagi Grimberg wrote:
>>
>>> When returning BLK_EH_HANDLED from nvme_rdma_timeout() the block layer
>>> complete the request.
>>> Error recovery may also complete the request when aborting the requests.
>>>
>>
>> This is still not a sufficient change log.
>>
>> You need to describe why is this being done vs. invalidating the rkey
>> in the timeout handler. And what does "may also" mean?
>>
>> Second, isn't the double completion protected by the request gstate?
> 
> It is protected if you use only blk_mq_complete_request() and not 
> __blk_mq_complete_request().
> You can see that blk_mq_rq_timed_out()  calls directly to 
> __blk_mq_complete_request() if the timeout
> function returns BLK_EH_HANDLED.

But we first update aborted_gstate (with interrupts disabled), sync srcu
and only them terminate expired requests. So I still don't understand
how we can end up completing a request twice.