PROBLEM: nvmet rxe : Kernel oops when running nvmf IO over rdma_rxe

> Hi All

Hey Stephen,

> So I thought I would try and run NVMe over Fabrics over
> Soft-RoCE. Both were adding to 4.8 so what could possibly go wrong
> ;-).

Obviously... :)

> Problem
> -------
> Kernel panics when attempting to run NVMe over Fabrics I/O over
> soft-RoCE.
> Interestingly nvme discover and connect seem to go well. In some cases
> I even seem to be able to issue some IO against the /dev/nvme0n1
> device on the host. However pretty quick I get a kernel oops on the
> target as shown below.

Hmm, does this crash happens even if there is no IO? probably
if not discover works well.

> My testing of soft-roce itself using userspace tools like ib_write_bw
> seem to be passing. So I am thinking the interaction between the
> kernel space interface for RXE and NVMf are not playing well
> together.

Thats a fair assumption...

> Oops Trace
> -----------
> I am including a couple of lines before the oops because I suspect
> they might be relevant. addr2line decodes the last addrss in the call
> trace as
> ida_simple_remove(&nvmet_rdma_queue_ida, queue->idx);

Hmm, How did you get to this line?
I got:
Anyway, this looks like a use-after-free condition. The strange thing
is that we don't see any queues being freed twice (we have a print

I suspect that either we have some problems with the draining logic in
rxe or, we uncovered a bug in nvmet-rdma that is triggered with rxe on
a VM (back when I tested this I didn't get this, so things must have

