target crash / host hang with nvme-all.3 branch of nvme-fabrics
Ming Lin
mlin at kernel.org
Tue Jun 28 14:04:11 PDT 2016
On Tue, 2016-06-28 at 14:43 -0500, Steve Wise wrote:
> > I'm using a ram disk for the target. Perhaps before
> > I was using a real nvme device. I'll try that too and see if I still hit this
> > deadlock/stall...
> >
>
> Hey Ming,
>
> Seems using a real nvme device at the target vs a ram device, avoids this new
> deadlock issue. And I'm running so-far w/o the usual touch-after-free crash.
> Usually I hit it quickly. It looks like your patch did indeed fix that. So:
>
> 1) We need to address Christoph's concern that your fix isn't the ideal/correct
> solution. How do you want to proceed on that angle? How can I help?
This one should be more correct.
Actually, the rsp was leaked when queue->state is
NVMET_RDMA_Q_DISCONNECTING. So we should put it back.
It works for me. Could you help to verify?
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 425b55c..ee8b85e 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -727,6 +727,8 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
spin_lock_irqsave(&queue->state_lock, flags);
if (queue->state == NVMET_RDMA_Q_CONNECTING)
list_add_tail(&rsp->wait_list, &queue->rsp_wait_list);
+ else
+ nvmet_rdma_put_rsp(rsp);
spin_unlock_irqrestore(&queue->state_lock, flags);
return;
}
>
> 2) the deadlock below is probably some other issue. Looks more like a cxgb4
> problem at first glance. I'll look into this one...
>
> Steve.
>
>
More information about the Linux-nvme
mailing list