nvmf/rdma host crash during heavy load and keep alive recovery

Steve Wise swise at opengridcomputing.com
Tue Sep 27 07:07:01 PDT 2016


> Christoph,
> 
> I'm still trying to understand how it is possible to
> get to a point where the request queue is stopped while
> the hardware context is not...
> 
> The code in rdma.c seems to do the right thing, but somehow
> a stray request sneaks in to our submission path when its not
> expected to.
> 
> Steve, is the request a normal read/write? or is it something
> else triggered from the reconnect flow?

It is a normal IO request I think.  length 64. 1 sge.   Sometimes I see a REG_MR
also filled out in the nvme_rdma_request->reg_wr struct.

I'm going to try Bart's series now to see if it fixes this issue...

Steve.




More information about the Linux-nvme mailing list