[PATCH v2] RDMA/cma: prevent rdma id destroy during cma_iw_handler
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Mon Jun 12 18:43:43 PDT 2023
On Jun 12, 2023 / 11:18, Jason Gunthorpe wrote:
> On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote:
> > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private
> > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG:
> > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1].
> > To prevent the destroy of id_priv, keep its reference count by calling
> > cma_id_get() and cma_id_put() at start and end of cma_iw_handler().
> >
> > [1]
> >
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0
> > Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9
> >
> > CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
> > Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x57/0x90
> > print_report+0xcf/0x660
> > ? __mutex_lock+0x1324/0x18f0
> > kasan_report+0xa4/0xe0
> > ? __mutex_lock+0x1324/0x18f0
> > __mutex_lock+0x1324/0x18f0
> > ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
> > ? _raw_spin_unlock_irqrestore+0x30/0x60
> > ? rcu_is_watching+0x11/0xb0
> > ? _raw_spin_unlock_irqrestore+0x30/0x60
> > ? trace_hardirqs_on+0x12/0x100
> > ? __pfx___mutex_lock+0x10/0x10
> > ? __percpu_counter_sum+0x147/0x1e0
> > ? domain_dirty_limits+0x246/0x390
> > ? wb_over_bg_thresh+0x4d5/0x610
> > ? rcu_is_watching+0x11/0xb0
> > ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
> > cma_iw_handler+0xac/0x4f0 [rdma_cm]
>
> What is the full call chain here, eg with the static functions
> un-inlined?
I checked the inlined func call chain from cm_work_handler to cma_iw_handler (I
recreated the symptom using kernel v6.4-rc5, so, address numbers are different):
$ ./scripts/faddr2line ./drivers/infiniband/core/iw_cm.ko cm_work_handler+0xb57/0x1c50
cm_work_handler+0xb57/0x1c50:
cm_close_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:974
(inlined by) process_event at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:997
(inlined by) cm_work_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:1036
With this, my understanding of the full call chain from NVME driver to
cma_iw_handler is as follows, including task switch to cm_work_handler:
nvme_rdma_teardown_io_queue
nvme_rdma_stop_io_queues
nvme_rdma_stop_queue
__nvme_rdma_stop_queue
rdma_disconnect
iw_cm_disconnect
iwcm_modify_qp_sqd
ib_modify_qp
_ib_modify_qp
ib_security_modify_qp
siw_verbs_modify_qp
siw_qp_modify
siw_qp_cm_drop
siw_cm_upcall(IW_CM_EVENT_CLOSE)
cm_event_handler -> refcount_inc(&cm_id_priv->refoucnt)
queue_work
-> cm_work_handler
process_event
cm_close_handler
cm_work_handler
cma_iw_handler
> >
> > drivers/infiniband/core/cma.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> > index 93a1c48d0c32..c5267d9bb184 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
> > struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr;
> > struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr;
> >
> > + cma_id_get(id_priv);
> > mutex_lock(&id_priv->handler_mutex);
> > if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
> > goto out;
> > @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
> > if (ret) {
> > /* Destroy the CM ID by returning a non-zero value. */
> > id_priv->cm_id.iw = NULL;
> > + cma_id_put(id_priv);
> > destroy_id_handler_unlock(id_priv);
> > return ret;
> > }
> >
> > out:
> > mutex_unlock(&id_priv->handler_mutex);
> > + cma_id_put(id_priv);
> > return ret;
> > }
>
> cm_work_handler already has a ref on the iwcm_id_private
>
> I think there is likely some much larger issue with the IW CM if the
> cm_id can be destroyed while the iwcm_id is in use? It is weird that
> there are two id memories for this :\
My understanding about the call chain to rdma id destroy is as follows. I guess
_destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure
why it does not wait for cm_id deref by cm_work_handler.
nvme_rdma_teardown_io_queueus
nvme_rdma_stop_io_queues -> chained to cma_iw_handler
nvme_rdma_free_io_queues
nvme_rdma_free_queue
rdma_destroy_id
mutex_lock(&id_priv->handler_mutex)
destroy_id_handler_unlock
mutex_unlock(&id_priv->handler_mutex)
_destory_id
iw_destroy_cm_id
wait_for_completiion(&id_priv->comp)
kfree(id_priv)
More information about the Linux-nvme
mailing list