nvmf/rdma host crash during heavy load and keep alive recovery
Steve Wise
swise at opengridcomputing.com
Tue Sep 27 07:07:01 PDT 2016
> Christoph,
>
> I'm still trying to understand how it is possible to
> get to a point where the request queue is stopped while
> the hardware context is not...
>
> The code in rdma.c seems to do the right thing, but somehow
> a stray request sneaks in to our submission path when its not
> expected to.
>
> Steve, is the request a normal read/write? or is it something
> else triggered from the reconnect flow?
It is a normal IO request I think. length 64. 1 sge. Sometimes I see a REG_MR
also filled out in the nvme_rdma_request->reg_wr struct.
I'm going to try Bart's series now to see if it fixes this issue...
Steve.
More information about the Linux-nvme
mailing list