nvmf/rdma host crash during heavy load and keep alive recovery

Steve Wise swise at opengridcomputing.com
Thu Sep 8 12:15:52 PDT 2016


 
> > >> Now, given that you already verified that the queues are stopped with
> > >> BLK_MQ_S_STOPPED, I'm looking at blk-mq now.
> > >>
> > >> I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed
> take
> > >> BLK_MQ_S_STOPPED into account. Theoretically  if we free the queue
> > >> pairs after we passed these checks while the rq_list is being
> processed
> > >> then we can end-up with this condition, but given that it takes
> > >> essentially forever (10 seconds) I tend to doubt this is the case.
> > >>
> > >> HCH, Jens, Keith, any useful pointers for us?
> > >>
> > >> To summarize we see a stray request being queued long after we set
> > >> BLK_MQ_S_STOPPED (and by long I mean 10 seconds).
> > >
> > > Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the
> tag set
> > > for that queue as part of reconnecting?
> >
> > I don't see how that'd help...
> >
> 
> I can't explain this, but the nvme_rdma_queue.flags field has a bit set
> that shouldn't be set:
> 
> crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8
>   flags = 0x14
> 
> Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never
> be...
> 
> enum nvme_rdma_queue_flags {
>         NVME_RDMA_Q_CONNECTED = (1 << 0),
>         NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
>         NVME_RDMA_Q_DELETING = (1 << 2),
> };
> 
> The rest of the structure looks fine.  I've also seen crash dumps where
> bit 3 is set which is also not used.
> 
> /me confused...
> 

While working this with debug code to verify that we never create a qp, cq, or
cm_id where one already exists for an nvme_rdma_queue, I discovered a bug where
the Q_DELETING flag is never cleared, and thus a reconnect can leak qps and
cm_ids.  The fix, I think, is this:

@@ -563,6 +572,7 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
        int ret;

        queue = &ctrl->queues[idx];
+       queue->flags = 0;
        queue->ctrl = ctrl;
        init_completion(&queue->cm_done);

I think maybe the clearing of the Q_DELETING flag was lost when we moved to
using the ib_client for device removal.   I'll polish this up and submit a
patch. It should go with the next 4.8-rc push I think.

This doesn't resolve the original failure I'm chasing in this thread though :(

Steve.




More information about the Linux-nvme mailing list