nvmf/rdma host crash during heavy load and keep alive recovery

Sagi Grimberg sagi at grimberg.me
Thu Sep 15 02:53:52 PDT 2016


> @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>
>         WARN_ON_ONCE(rq->tag < 0);
>
> +       BUG_ON(hctx != queue->hctx);
> +       BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state));
>         dev = queue->device->dev;
>         ib_dma_sync_single_for_cpu(dev, sqe->dma,
>                         sizeof(struct nvme_command), DMA_TO_DEVICE);
> ---
>
> When I reran the test forcing reconnects, I hit the BUG_ON(hctx != queue->hctx)
> in nvme_rdma_queue_rq() when doing the first reconnect (not when initially
> connecting the targets).   Here is the back trace.  Is my debug logic flawed?
> Or does this mean something is screwed up once we start reconnecting.

This is weird indeed.

The fact that you trigger this means that you successfully reconnect
correct?

If queue is corrupted it would explain the bogus post on a freed or
non-existing qp...



More information about the Linux-nvme mailing list