target crash / host hang with nvme-all.3 branch of nvme-fabrics
Steve Wise
swise at opengridcomputing.com
Tue Jun 28 07:15:22 PDT 2016
> > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> > index 425b55c..627942c 100644
> > --- a/drivers/nvme/target/rdma.c
> > +++ b/drivers/nvme/target/rdma.c
> > @@ -425,7 +425,15 @@ static void nvmet_rdma_free_rsps(struct
> nvmet_rdma_queue *queue)
> > for (i = 0; i < nr_rsps; i++) {
> > struct nvmet_rdma_rsp *rsp = &queue->rsps[i];
> >
> > - list_del(&rsp->free_list);
> > + /*
> > + * Don't call "list_del(&rsp->free_list)", because:
> > + * It could be already removed from the free list by
> > + * nvmet_rdma_get_rsp(), or it's on the queue::rsp_wait_list
> > + *
> > + * It's safe we just free it because at this point the queue
> > + * was already disconnected so nvmet_rdma_get_rsp() won't be
> > + * called any more.
> > + */
> > nvmet_rdma_free_rsp(ndev, rsp);
> > }
> > kfree(queue->rsps);
>
> That seems like another symptom of not flushing unsignalled requests.
I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ. It
either posts chains via RDMA-RW and the last in the chain is always signaled (I
think), or it posts signaled IO responses.
> At the time we call nvmet_rdma_free_rsps none of the rsp structures
> should be in use.
More information about the Linux-nvme
mailing list