target crash / host hang with nvme-all.3 branch of nvme-fabrics
Ming Lin
mlin at kernel.org
Tue Jun 28 09:49:56 PDT 2016
On Tue, 2016-06-28 at 11:31 -0500, Steve Wise wrote:
> > On Tue, Jun 28, 2016 at 09:15:22AM -0500, Steve Wise wrote:
> > > I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ.
> > > It either posts chains via RDMA-RW and the last in the chain is always
> > > signaled (I think), or it posts signaled IO responses.
> >
> > Indeed. So we need to figure out where we don't release a rsp.
> >
>
> Hey Ming,
>
> For what its worth, the change you proposed in this thread isn't working for me.
> I see maybe one or two recoveries successful, then the target gets stuck. I see
> several workq threads stuck destroying various qps, one thread stuck draining a
> qp. If this change is not the proper fix, then I'm not going to debug this
> further.
I didn't see this during overnight test. Possibly another bug.
Could you post the stuck call stack?
I assume you are still doing below tests on host:
run fio test
Then,
while [ 1 ] ; do
ifconfig $ETH down ; sleep $(( 10 + ($RANDOM & 0x7) )); ifconfig $ETH up ;sleep $(( 10 + ($RANDOM & 0x7) ))
done
More information about the Linux-nvme
mailing list