[PATCH] nvme-rdma: avoid repeated request completion for concurrent nvme_rdma_timeout

Sagi Grimberg sagi at grimberg.me
Wed Jan 13 20:15:18 EST 2021


> On 1/5/21 10:36 PM, Chao Leng wrote:
>> A crash happens when inject completing request long time(nearly 30s).
>> Each name space has a request queue, when inject completing request long
>> time, multi request queues may has time out requests at the same time,
>> nvme_rdma_timeout will execute concurrently. Multi requests in different
>> request queues may be queued in the same rdma queue, multi
>> nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
>> The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
>> stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
>> is already cleared, and then directly complete the requests, but the
>> rdma queue may be not stopped and the request may be already completed
>> in qp and wait treated, the request will be repeated completed.
>> Add a multex lock to serialize nvme_rdma_stop_queue.
> 
> This looks reasonable to me,
> 
> Mind sending one for nvme-tcp as well?

BTW, I'm assuming you mean use-after-free or double completion when you
are referring to repeated completions? I think it would be easier to
understand if you just say that completing request before the qp is
fully drained may lead to a use-after-free condition.



More information about the Linux-nvme mailing list