[PATCH 2/2 V2] nvme-rdma: Fix race at error recovery

Sagi Grimberg sagi at grimberg.me
Sun Apr 8 08:31:26 PDT 2018


> We need to stop rdma queues before canceling the requests.
> With this approach we avoid the race between error recovery work
> and rdma completions.
> The commit fix a NULL deref of a request MR at nvme_rdma_process_nvme_rsp().

What race? nvme_rdma_stop_queue() has two goals:
1. prevent new posts on the rdma qp
2. drain all inflight posts

"With this approach we avoid the race between error recovery work
  and rdma completions" is not sufficient (to put it politely).

The original post of this patch was designed to prevent the
device from performing DMA to/from user buffers which might have
been already completed by nvme_cancel_request.



More information about the Linux-nvme mailing list