nvmet_rdma crash - DISCONNECT event with NULL queue

Sagi Grimberg sagi at grimberg.me
Tue Nov 1 09:44:21 PDT 2016


>> pphh, somehow managed to miss it...
>>
>> So we have a case where we can call rdma_destroy_qp and
>> then rdma_destroy_id but still get events on the cm_id...
>> Not very nice...
>>
>> So I think that the patch from Bart a few weeks ago was correct:
>>
>
> Not quite.  It just guards against a null queue for TIMEWAIT_EXIT, which is only
> generated by the IB_CM.

Yes, this is why we need ADDR_CHANGE and DISCONNECTED too
"(and include all the relevant cases around it)"

The other events we don't get to LIVE state and we don't have
other error flows that will trigger queue teardown sequence.

--
nvmet-rdma: Fix possible NULL deref when handling rdma cm
  events

When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id->qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id->qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
  drivers/nvme/target/rdma.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b4d648536c3e..240888efd920 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
         case RDMA_CM_EVENT_ADDR_CHANGE:
         case RDMA_CM_EVENT_DISCONNECTED:
         case RDMA_CM_EVENT_TIMEWAIT_EXIT:
-               nvmet_rdma_queue_disconnect(queue);
+               /*
+                * We might end up here when we already freed the qp
+                * which means queue release sequence is in progress,
+                * so don't get in the way...
+                */
+               if (!queue)
+                       nvmet_rdma_queue_disconnect(queue);
                 break;
         case RDMA_CM_EVENT_DEVICE_REMOVAL:
                 ret = nvmet_rdma_device_removal(cm_id, queue);
--



More information about the Linux-nvme mailing list