target crash / host hang with nvme-all.3 branch of nvme-fabrics

Steve Wise swise at opengridcomputing.com
Tue Jun 28 07:15:22 PDT 2016


> > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> > index 425b55c..627942c 100644
> > --- a/drivers/nvme/target/rdma.c
> > +++ b/drivers/nvme/target/rdma.c
> > @@ -425,7 +425,15 @@ static void nvmet_rdma_free_rsps(struct
> nvmet_rdma_queue *queue)
> >  	for (i = 0; i < nr_rsps; i++) {
> >  		struct nvmet_rdma_rsp *rsp = &queue->rsps[i];
> >
> > -		list_del(&rsp->free_list);
> > +		/*
> > +		 * Don't call "list_del(&rsp->free_list)", because:
> > +		 * It could be already removed from the free list by
> > +		 * nvmet_rdma_get_rsp(), or it's on the queue::rsp_wait_list
> > +		 *
> > +		 * It's safe we just free it because at this point the queue
> > +		 * was already disconnected so nvmet_rdma_get_rsp() won't be
> > +		 * called any more.
> > +		 */
> >  		nvmet_rdma_free_rsp(ndev, rsp);
> >  	}
> >  	kfree(queue->rsps);
> 
> That seems like another symptom of not flushing unsignalled requests.

I'm not so sure.  I don't see where nvmet leaves unsignaled wrs on the SQ.  It
either posts chains via RDMA-RW and the last in the chain is always signaled (I
think), or it posts signaled IO responses.

> At the time we call nvmet_rdma_free_rsps none of the rsp structures
> should be in use.






More information about the Linux-nvme mailing list