nvmf/rdma host crash during heavy load and keep alive recovery

Thu Sep 8 13:47:02 PDT 2016

> >> Does this happen if you change the reconnect delay to be something
> >> different than 10 seconds? (say 30?)
> >>
> >
> > Yes.  But I noticed something when performing this experiment that is an
> > important point, I think:  if I just bring the network interface down and
leave
> > it down, we don't crash.  During this state, I see the host continually
> > reconnecting after the reconnect delay time, timing out trying to reconnect,
and
> > retrying after another reconnect_delay period.  I see this for all 10
targets of
> > course.  The crash only happens when I bring the interface back up, and the
> > targets begin to reconnect.   So the process of successfully reconnecting
the
> > RDMA QPs, and restarting the nvme queues is somehow triggering running an
> nvme
> > request too soon (or perhaps on the wrong queue).
> 
> Interesting. Given this is easy to reproduce, can you record the:
> (request_tag, *queue, *qp) for each request submitted?
> 
> I'd like to see that the *queue stays the same for each tag
> but the *qp indeed changes.
> 

I tried this, and didn't hit the BUG_ON(), yet still hit the crash.  I believe
this verifies that  *queue never changed...

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c075ea5..a77729e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -76,6 +76,7 @@ struct nvme_rdma_request {
        struct ib_reg_wr        reg_wr;
        struct ib_cqe           reg_cqe;
        struct nvme_rdma_queue  *queue;
+       struct nvme_rdma_queue  *save_queue;
        struct sg_table         sg_table;
        struct scatterlist      first_sgl[];
 };
@@ -354,6 +355,8 @@ static int __nvme_rdma_init_request(struct nvme_rdma_ctrl
*ctrl,
        }

        req->queue = queue;
+       if (!req->save_queue)
+               req->save_queue = queue;

        return 0;

@@ -1434,6 +1436,9 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,

        WARN_ON_ONCE(rq->tag < 0);

+       BUG_ON(queue != req->queue);
+       BUG_ON(queue != req->save_queue);
+
        dev = queue->device->dev;
        ib_dma_sync_single_for_cpu(dev, sqe->dma,
                        sizeof(struct nvme_command), DMA_TO_DEVICE);