nvmf/rdma host crash during heavy load and keep alive recovery

Fri Sep 9 08:57:45 PDT 2016

> >
> 
> I can't explain this, but the nvme_rdma_queue.flags field has a bit set that
> shouldn't be set:
> 
> crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8
>   flags = 0x14
> 
> Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never
be...
> 
> enum nvme_rdma_queue_flags {
>         NVME_RDMA_Q_CONNECTED = (1 << 0),
>         NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
>         NVME_RDMA_Q_DELETING = (1 << 2),
> };
> 
> The rest of the structure looks fine.  I've also seen crash dumps where bit 3
is
> set which is also not used.
> 
> /me confused...

I'm dumb:  1<<0 is 1, so CONNECTED is bit 1, QUEUE_ALLOCATED is bit 2, and
Q_DELETING is bit 4!  Bits 0 and 3 are not used.  So 0x14 is bits 4 and 2:
DELETING and QUEUE_ALLOCATED.  The queue_flags enum should not be using the
(1<<X) initialization.  Rather, they should be 0, 1, 2, etc...