target crash / host hang with nvme-all.3 branch of nvme-fabrics
Steve Wise
swise at opengridcomputing.com
Tue Jun 28 09:31:27 PDT 2016
> On Tue, Jun 28, 2016 at 09:15:22AM -0500, Steve Wise wrote:
> > I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ.
> > It either posts chains via RDMA-RW and the last in the chain is always
> > signaled (I think), or it posts signaled IO responses.
>
> Indeed. So we need to figure out where we don't release a rsp.
>
Hey Ming,
For what its worth, the change you proposed in this thread isn't working for me.
I see maybe one or two recoveries successful, then the target gets stuck. I see
several workq threads stuck destroying various qps, one thread stuck draining a
qp. If this change is not the proper fix, then I'm not going to debug this
further.
More information about the Linux-nvme
mailing list