[PATCH] nvme-rdma: avoid repeated request completion for concurrent nvme_rdma_timeout

Thu Jan 14 01:58:10 EST 2021


On 2021/1/14 9:15, Sagi Grimberg wrote:
> 
>> On 1/5/21 10:36 PM, Chao Leng wrote:
>>> A crash happens when inject completing request long time(nearly 30s).
>>> Each name space has a request queue, when inject completing request long
>>> time, multi request queues may has time out requests at the same time,
>>> nvme_rdma_timeout will execute concurrently. Multi requests in different
>>> request queues may be queued in the same rdma queue, multi
>>> nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
>>> The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
>>> stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
>>> is already cleared, and then directly complete the requests, but the
>>> rdma queue may be not stopped and the request may be already completed
>>> in qp and wait treated, the request will be repeated completed.
>>> Add a multex lock to serialize nvme_rdma_stop_queue.
>>
>> This looks reasonable to me,
>>
>> Mind sending one for nvme-tcp as well?
> 
> BTW, I'm assuming you mean use-after-free or double completion when you
> are referring to repeated completions? I think it would be easier to
> understand if you just say that completing request before the qp is
> fully drained may lead to a use-after-free condition.
ok, thanks for your suggestion.
> .