[PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl

Steve Wise swise at opengridcomputing.com
Fri Jul 15 08:52:02 PDT 2016


> > Correction: the del controller work thread is trying to destroy the qp
> > associated with the cm_id.  But the point is this cm_id/qp should NOT be
> touched
> > by the del controller thread because the unplug thread should have cleared
the
> > Q_CONNECTED bit and thus took ownership of destroy it.  I'll add some debug
> > prints to see which path is being taken by nvme_rdma_device_unplug().
> >
> 
> After further debug, the del controller work thread is not trying to destroy
the
> qp/cm_id that received the event.  That qp/cm_id is successfully deleted by
the
> unplug thread.  However the first cm_id/qp that is destroyed by the del
> controller work thread gets stuck in c4iw_destroy_qp() due to the deadlock.
So
> I need to understand more about the deadlock...

Hey Sagi, here is some lite reading for you. :)

Prelude:  As part of disconnecting an iwarp connection, the iwarp provider needs
to post an IW_CM_EVENT_CLOSE event to iw_cm, which is scheduled onto the
singlethread workq thread for iw_cm.  

Here is what happens with Sagi's patch:

nvme_rdma_device_unplug() calls nvme_rdma_stop_queue() which calls
rdma_disconnect().  This triggers the disconnect.  iw_cxgb4 posts the
IW_CM_EVENT_CLOSE to iw_cm, which ends up calling cm_close_handler() in the
iw_cm workq thread context.  cm_close_handler() calls the rdma_cm event handler
for this cm_id, function cm_iw_handler(), which blocks until any currently
running event handler for this cm_id finishes.  It does this by calling
cm_disable_callback().  However since this whole unplug process is running in
the event handler function for this same cm_id, the iw_cm workq thread is now
stuck in a deadlock.   nvme_rdma_device_unplug() however, continues on and
schedules the controller delete worker thread and waits for it to complete.  The
delete controller worker thread tries to disconnect and destroy all the
remaining IO queues, but gets stuck in the destroy() path on the first IO queue
because the iw_cm workq thread is already stuck, and processing the CLOSE event
is required to release a reference the iw_cm has on the iwarp providers qp.  So
everything comes to a grinding halt....

Now: Ming's 2 patches avoid this deadlock because the cm_id that received the
device removal event is disconnected/destroyed _only after_ all the controller
queues are disconnected/destroyed.  So nvme_rdma_device_unplug() doesn't get
stuck waiting for the controller to delete the io queues, and only after that
completes, does it delete the cm_id/qp that got the device removal event.  It
then returns thus causing the rdma_cm to release the cm_id's callback mutex.
This causes the iw_cm workq thread to now unblock and we continue on.  (can you
say house of cards?)

So the net is:  the cm_id that received the device remove event _must_ be
disconnect/destroyed _last_.  

Steve.




More information about the Linux-nvme mailing list