nvmf/rdma host crash during heavy load and keep alive recovery

Thu Sep 8 10:19:07 PDT 2016

> >> Now, given that you already verified that the queues are stopped with
> >> BLK_MQ_S_STOPPED, I'm looking at blk-mq now.
> >>
> >> I see that blk_mq_run_hw_queue() and __blk_mq_run_hw_queue() indeed take
> >> BLK_MQ_S_STOPPED into account. Theoretically  if we free the queue
> >> pairs after we passed these checks while the rq_list is being processed
> >> then we can end-up with this condition, but given that it takes
> >> essentially forever (10 seconds) I tend to doubt this is the case.
> >>
> >> HCH, Jens, Keith, any useful pointers for us?
> >>
> >> To summarize we see a stray request being queued long after we set
> >> BLK_MQ_S_STOPPED (and by long I mean 10 seconds).
> >
> > Does nvme-rdma need to call blk_mq_queue_reinit() after it reinits the tag
set
> > for that queue as part of reconnecting?
> 
> I don't see how that'd help...
> 

I can't explain this, but the nvme_rdma_queue.flags field has a bit set that
shouldn't be set:

crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8
  flags = 0x14

Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never be...

enum nvme_rdma_queue_flags {
        NVME_RDMA_Q_CONNECTED = (1 << 0),
        NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
        NVME_RDMA_Q_DELETING = (1 << 2),
};

The rest of the structure looks fine.  I've also seen crash dumps where bit 3 is
set which is also not used.  

/me confused...