nvmf/rdma host crash during heavy load and keep alive recovery
Gabriel Krisman Bertazi
krisman at linux.vnet.ibm.com
Thu Sep 15 07:00:26 PDT 2016
"Steve Wise" <swise at opengridcomputing.com> writes:
> @@ -622,6 +625,7 @@ static void nvme_rdma_stop_and_free_queue(struct
> nvme_rdma_queue *queue)
> {
> if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
> return;
> + BUG_ON(!test_bit(BLK_MQ_S_STOPPED, &queue->hctx->state));
> nvme_rdma_stop_queue(queue);
> nvme_rdma_free_queue(queue);
> }
> @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>
> WARN_ON_ONCE(rq->tag < 0);
>
> + BUG_ON(hctx != queue->hctx);
> + BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state));
> dev = queue->device->dev;
> ib_dma_sync_single_for_cpu(dev, sqe->dma,
> sizeof(struct nvme_command), DMA_TO_DEVICE);
>
This reminds me of the discussion I had with Jens a few weeks ago here:
http://lists.infradead.org/pipermail/linux-nvme/2016-August/005916.html
The BUG_ON I hit is similar to yours, but for nvme over PCI. I think
the update queues code will reach a similar path of remapping, but I
didnt go out and check yet.
Can you check you are running with the patch he mentioned at:
http://lists.infradead.org/pipermail/linux-nvme/2016-August/005962.html
Thanks,
--
Gabriel Krisman Bertazi
More information about the Linux-nvme
mailing list