nvmf/rdma host crash during heavy load and keep alive recovery

Steve Wise swise at opengridcomputing.com
Thu Sep 15 07:31:41 PDT 2016


> 
> This reminds me of the discussion I had with Jens a few weeks ago here:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2016-August/005916.html
> 
> The BUG_ON I hit is similar to yours, but for nvme over PCI.  I think
> the update queues code will reach a similar path of remapping, but I
> didnt go out and check yet.
> 
> Can you check you are running with the patch he mentioned at:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2016-August/005962.html

I don't have this patch, and blk_mq_hctx_cpu_offline() is very different from
the code in that patch.

I'm using 4.8-rc5 plus these patches from nvmf-4.8-rc and linux-block.  This
matches Jens' for-linus tag at this point in time.  It will all hit -rc6.

cbee748 nvme-rdma: add back dependency on CONFIG_BLOCK
b8ce92e nvme-rdma: fix null pointer dereference on req->mr
c51b3c7 nvme-rdma: use ib_client API to detect device removal
2b24ee4 nvme-rdma: add DELETING queue flag
8c7d713 nvme-rdma: destroy nvme queue rdma resources on connect failure
4604d4e nvme_rdma: keep a ref on the ctrl during delete/flush
372a82d iw_cxgb4: block module unload until all ep resources are released
7053902 iw_cxgb4: call dev_put() on l2t allocation failure
82469c5 nvme: Don't suspend admin queue that wasn't created
c693593 Linux 4.8-rc5





More information about the Linux-nvme mailing list