blk_mq_reinit_tagset during NVMEoF port toggling

Max Gurtovoy maxg at mellanox.com
Mon Aug 28 08:49:37 PDT 2017



On 8/28/2017 4:30 PM, Sagi Grimberg wrote:
> 
>>>> Hi guys,
>>>
>>> Hi Max, CCing linux-nvme.
>> Hi Sagi,
>>>
>>>> we have encountered a bug during our port toggling test with MP 
>>>> using NVMEoF over RDMA (1 IO queue repro it quickly).
>>>> We have been receiving local protection errors dumps after failing 
>>>> back to the port that became active again (it's not the 
>>>> retransmission issue we fixed in the past). After debugging it we 
>>>> saw that the requests have been doing a reinit process 
>>>> (dereg_mr/alloc_mr).
>>>> But somehow the req->mr->need_inval is still true in the beginning 
>>>> of nvme_rdma_queue_rq function. This shouldn't happen since we 
>>>> should have perform the dereg_mr/alloc_mr in the reinit func and set 
>>>> it to false.
>>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>>
>>> Which code base is this max?
>> The code base is kernel 4.13.0-rc3.
>>>
>>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
>> Yes.
>>>
>>> if so, maybe it is possible that not all requests are being 
>>> reinitialized.
>>> Can you reproduce with the following applied:
>> We reproduced this issue with similar prints and we didn't see them.
>> blk_mq_reinit_tagset() went over all the the static requests.
> 
> If mr->need_inval is true in queue_rq it means that reinit was not
> called on it, did you see a request that performed reinit but
> still had need_inval == true?

No. The requests that are "reinited" are the static_rqs.
 From what I saw in the code, the MP has a cloned request queue that it 
uses but those requests are not "reinited" (only the original requests).



More information about the Linux-nvme mailing list