nvmf/rdma host crash during heavy load and keep alive recovery

Thu Sep 15 13:58:25 PDT 2016

> > And I see that 2 sets of blk_mq_hw_ctx structs get assigned to the same 32
> > queues.  Here is the output for 1 target connect with 32 cores.  So is it
> > expected that the 32 nvme_rdma IO queues get assigned to 2 sets of hw_ctx
> > structs?  The 2nd set is getting initialized as part of namespace
scanning...
> 
> 
> So here is the stack for the first time the nvme_rdma_queue structs are bound
to
> an hctx:
> 
> [ 2006.826941]  [<ffffffffa066c452>] nvme_rdma_init_hctx+0x102/0x110
> [nvme_rdma]
> [ 2006.835409]  [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0
> [ 2006.842530]  [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240
> [ 2006.850097]  [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410
> [ 2006.858107]  [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0
> [ 2006.865765]  [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70
> [ 2006.872829]  [<ffffffffa066d9f9>] nvme_rdma_create_io_queues+0x189/0x210
> [nvme_rdma]
> [ 2006.881917]  [<ffffffffa066e813>] ?
> nvme_rdma_configure_admin_queue+0x1e3/0x290 [nvme_rdma]
> [ 2006.891611]  [<ffffffffa066ec65>] nvme_rdma_create_ctrl+0x3a5/0x4c0
> [nvme_rdma]
> [ 2006.900260]  [<ffffffffa0654d33>] ? nvmf_create_ctrl+0x33/0x210
> [nvme_fabrics]
> [ 2006.908799]  [<ffffffffa0654e82>] nvmf_create_ctrl+0x182/0x210
[nvme_fabrics]
> [ 2006.917228]  [<ffffffffa0654fbc>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
> 

The above stack is creating hctx queues for the nvme_rdma_ctrl->ctrl.connect_q
request queue.

> And here is the 2nd time the same nvme_rdma_queue is bound to a different
hctx::
> 
> [ 2007.263068]  [<ffffffffa066c40c>] nvme_rdma_init_hctx+0xbc/0x110
> [nvme_rdma]
> [ 2007.271656]  [<ffffffff8133a52e>] blk_mq_init_hctx+0x21e/0x2e0
> [ 2007.279027]  [<ffffffff8133a6ea>] blk_mq_realloc_hw_ctxs+0xfa/0x240
> [ 2007.286829]  [<ffffffff8133b342>] blk_mq_init_allocated_queue+0x92/0x410
> [ 2007.295066]  [<ffffffff8132a969>] ? blk_alloc_queue_node+0x259/0x2c0
> [ 2007.302962]  [<ffffffff8135ce84>] ? ida_pre_get+0xb4/0xe0
> [ 2007.309894]  [<ffffffff8133b6ff>] blk_mq_init_queue+0x3f/0x70
> [ 2007.317164]  [<ffffffffa0272998>] nvme_alloc_ns+0x88/0x240 [nvme_core]
> [ 2007.325218]  [<ffffffffa02728bc>] ? nvme_find_get_ns+0x5c/0xb0 [nvme_core]
> [ 2007.333612]  [<ffffffffa0273059>] nvme_validate_ns+0x79/0x90 [nvme_core]
> [ 2007.341825]  [<ffffffffa0273166>] nvme_scan_ns_list+0xf6/0x1f0 [nvme_core]
> [ 2007.350214]  [<ffffffffa027338b>] nvme_scan_work+0x12b/0x140 [nvme_core]
> [ 2007.358427]  [<ffffffff810a1613>] process_one_work+0x183/0x4d0
>

This stack is creating hctx queues for the namespace created for this target
device.

Sagi,

Should nvme_rdma_error_recovery_work() be stopping the hctx queues for
ctrl->ctrl.connect_q too?

Something like:

@@ -781,6 +790,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct
*work)
        if (ctrl->queue_count > 1)
                nvme_stop_queues(&ctrl->ctrl);
        blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
+       blk_mq_stop_hw_queues(ctrl->ctrl.connect_q);

        /* We must take care of fastfail/requeue all our inflight requests */
        if (ctrl->queue_count > 1)

And then restart these after the nvme_rdma_queue rdma resources are reallocated?