[PATCH v2] RDMA/cma: prevent rdma id destroy during cma_iw_handler

Mon Jun 12 18:43:43 PDT 2023

On Jun 12, 2023 / 11:18, Jason Gunthorpe wrote:
> On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote:
> > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private
> > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG:
> > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1].
> > To prevent the destroy of id_priv, keep its reference count by calling
> > cma_id_get() and cma_id_put() at start and end of cma_iw_handler().
> > 
> > [1]
> > 
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0
> > Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9
> > 
> > CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
> > Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x57/0x90
> >  print_report+0xcf/0x660
> >  ? __mutex_lock+0x1324/0x18f0
> >  kasan_report+0xa4/0xe0
> >  ? __mutex_lock+0x1324/0x18f0
> >  __mutex_lock+0x1324/0x18f0
> >  ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
> >  ? _raw_spin_unlock_irqrestore+0x30/0x60
> >  ? rcu_is_watching+0x11/0xb0
> >  ? _raw_spin_unlock_irqrestore+0x30/0x60
> >  ? trace_hardirqs_on+0x12/0x100
> >  ? __pfx___mutex_lock+0x10/0x10
> >  ? __percpu_counter_sum+0x147/0x1e0
> >  ? domain_dirty_limits+0x246/0x390
> >  ? wb_over_bg_thresh+0x4d5/0x610
> >  ? rcu_is_watching+0x11/0xb0
> >  ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
> >  cma_iw_handler+0xac/0x4f0 [rdma_cm]
> 
> What is the full call chain here, eg with the static functions
> un-inlined?

I checked the inlined func call chain from cm_work_handler to cma_iw_handler (I
recreated the symptom using kernel v6.4-rc5, so, address numbers are different):

$ ./scripts/faddr2line ./drivers/infiniband/core/iw_cm.ko cm_work_handler+0xb57/0x1c50
cm_work_handler+0xb57/0x1c50:
cm_close_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:974
(inlined by) process_event at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:997
(inlined by) cm_work_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:1036

With this, my understanding of the full call chain from NVME driver to
cma_iw_handler is as follows, including task switch to cm_work_handler:

nvme_rdma_teardown_io_queue
 nvme_rdma_stop_io_queues
  nvme_rdma_stop_queue
   __nvme_rdma_stop_queue
    rdma_disconnect
     iw_cm_disconnect
      iwcm_modify_qp_sqd
       ib_modify_qp
        _ib_modify_qp
         ib_security_modify_qp
          siw_verbs_modify_qp
           siw_qp_modify
            siw_qp_cm_drop
             siw_cm_upcall(IW_CM_EVENT_CLOSE)
              cm_event_handler -> refcount_inc(&cm_id_priv->refoucnt)
               queue_work
      -> cm_work_handler
          process_event
           cm_close_handler
            cm_work_handler
             cma_iw_handler

> > 
> >  drivers/infiniband/core/cma.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> > index 93a1c48d0c32..c5267d9bb184 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
> >  	struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr;
> >  	struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr;
> >  
> > +	cma_id_get(id_priv);
> >  	mutex_lock(&id_priv->handler_mutex);
> >  	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
> >  		goto out;
> > @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
> >  	if (ret) {
> >  		/* Destroy the CM ID by returning a non-zero value. */
> >  		id_priv->cm_id.iw = NULL;
> > +		cma_id_put(id_priv);
> >  		destroy_id_handler_unlock(id_priv);
> >  		return ret;
> >  	}
> >  
> >  out:
> >  	mutex_unlock(&id_priv->handler_mutex);
> > +	cma_id_put(id_priv);
> >  	return ret;
> >  }
> 
> cm_work_handler already has a ref on the iwcm_id_private
> 
> I think there is likely some much larger issue with the IW CM if the
> cm_id can be destroyed while the iwcm_id is in use? It is weird that
> there are two id memories for this :\

My understanding about the call chain to rdma id destroy is as follows. I guess
_destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure
why it does not wait for cm_id deref by cm_work_handler.

nvme_rdma_teardown_io_queueus
 nvme_rdma_stop_io_queues -> chained to cma_iw_handler
 nvme_rdma_free_io_queues
  nvme_rdma_free_queue
   rdma_destroy_id
    mutex_lock(&id_priv->handler_mutex)
    destroy_id_handler_unlock
     mutex_unlock(&id_priv->handler_mutex)
     _destory_id
       iw_destroy_cm_id
       wait_for_completiion(&id_priv->comp)
       kfree(id_priv)