slab-use-after-free in __ib_process_cq

Sagi Grimberg sagi at grimberg.me
Thu May 4 08:45:54 PDT 2023


>>> Hi,
>>>
>>> While testing Jens' for-next branch I encountered a use-after-free
>>> issue, triggered by test nvmeof-mp/002. This is not the first time I see
>>> this issue - I had already observed this several weeks ago but I had not
>>> yet had the time to report this.
>>
>> That is surprising because this area did not change for quite a while 
>> now.
>>
>> CCing linux-rdma as well, I'm assuming that this is with rxe?
>> Does this happen with siw as well?
> 
> Hi Sagi,
> 
> This happened with the siw driver. I haven't tried the rxe driver for a 
> while.
> 
> The crash addresses correspond to the following source file and line:
> 
> (gdb) list *(__ib_process_cq+0x11c)
> 0x7f7c is in __ib_process_cq (drivers/infiniband/core/cq.c:110).
> 105                                             budget - completed), 
> wcs)) > 0) {
> 106                     for (i = 0; i < n; i++) {
> 107                             struct ib_wc *wc = &wcs[i];
> 108
> 109                             if (wc->wr_cqe)
> 110                                     wc->wr_cqe->done(cq, wc);
> 111                             else
> 112                                     WARN_ON_ONCE(wc->status == 
> IB_WC_SUCCESS);
> 113                     }
> 114
> 
> (gdb) list *(nvme_rdma_create_queue_ib+0x1a7)
> 0x3d47 is in nvme_rdma_create_queue_ib (drivers/nvme/host/rdma.c:219).
> 214     {
> 215             struct nvme_rdma_qe *ring;
> 216             int i;
> 217
> 218             ring = kcalloc(ib_queue_size, sizeof(struct 
> nvme_rdma_qe), GFP_KERNEL);
> 219             if (!ring)
> 220                     return NULL;
> 221
> 222             /*
> 223              * Bind the CQEs (post recv buffers) DMA mapping to the 
> RDMA queue
> 
> (gdb) list *(nvme_rdma_destroy_queue_ib+0x1b8)
> 0x2388 is in nvme_rdma_destroy_queue_ib (drivers/nvme/host/rdma.c:358).
> 353             kfree(ndev);
> 354     }
> 355
> 356     static void nvme_rdma_dev_put(struct nvme_rdma_device *dev)
> 357     {
> 358             kref_put(&dev->ref, nvme_rdma_free_dev);
> 359     }
> 360
> 361     static int nvme_rdma_dev_get(struct nvme_rdma_device *dev)
> 362     {
> 
> Shouldn't ib_drain_qp() be called before nvme_rdma_destroy_queue_ib() 
> destroys the QP?

Yes it absolutely should, and it is according to the code.
The only way that this can happen is something happens to
post a wr after the drain started, can't see how this happens though...



More information about the Linux-nvme mailing list