blk_mq_reinit_tagset during NVMEoF port toggling

Sagi Grimberg sagi at grimberg.me
Mon Aug 28 06:30:57 PDT 2017


>>> Hi guys,
>>
>> Hi Max, CCing linux-nvme.
> Hi Sagi,
>>
>>> we have encountered a bug during our port toggling test with MP using 
>>> NVMEoF over RDMA (1 IO queue repro it quickly).
>>> We have been receiving local protection errors dumps after failing 
>>> back to the port that became active again (it's not the 
>>> retransmission issue we fixed in the past). After debugging it we saw 
>>> that the requests have been doing a reinit process (dereg_mr/alloc_mr).
>>> But somehow the req->mr->need_inval is still true in the beginning of 
>>> nvme_rdma_queue_rq function. This shouldn't happen since we should 
>>> have perform the dereg_mr/alloc_mr in the reinit func and set it to 
>>> false.
>>> We don't see this issue in kernel older than 4.11 so before bisecting:
>>
>> Which code base is this max?
> The code base is kernel 4.13.0-rc3.
>>
>> is commit 842594c8775b585c58459e044708c0335b6aa6b7 applied?
> Yes.
>>
>> if so, maybe it is possible that not all requests are being 
>> reinitialized.
>> Can you reproduce with the following applied:
> We reproduced this issue with similar prints and we didn't see them.
> blk_mq_reinit_tagset() went over all the the static requests.

If mr->need_inval is true in queue_rq it means that reinit was not
called on it, did you see a request that performed reinit but
still had need_inval == true?



More information about the Linux-nvme mailing list