nvmf/rdma host crash during heavy load and keep alive recovery
Steve Wise
swise at opengridcomputing.com
Fri Sep 9 08:57:45 PDT 2016
> >
>
> I can't explain this, but the nvme_rdma_queue.flags field has a bit set that
> shouldn't be set:
>
> crash> nvme_rdma_queue.flags -x ffff880e52b8e7e8
> flags = 0x14
>
> Bit 2 is set, NVME_RDMA_Q_DELETING, but bit 4 is also set and should never
be...
>
> enum nvme_rdma_queue_flags {
> NVME_RDMA_Q_CONNECTED = (1 << 0),
> NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
> NVME_RDMA_Q_DELETING = (1 << 2),
> };
>
> The rest of the structure looks fine. I've also seen crash dumps where bit 3
is
> set which is also not used.
>
> /me confused...
I'm dumb: 1<<0 is 1, so CONNECTED is bit 1, QUEUE_ALLOCATED is bit 2, and
Q_DELETING is bit 4! Bits 0 and 3 are not used. So 0x14 is bits 4 and 2:
DELETING and QUEUE_ALLOCATED. The queue_flags enum should not be using the
(1<<X) initialization. Rather, they should be 0, 1, 2, etc...
More information about the Linux-nvme
mailing list