[PATCH v2] RDMA/cma: prevent rdma id destroy during cma_iw_handler

Mon Jun 12 07:18:05 PDT 2023

On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote:
> When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private
> *id_priv can be destroyed during cma_iw_handler call. This causes "BUG:
> KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1].
> To prevent the destroy of id_priv, keep its reference count by calling
> cma_id_get() and cma_id_put() at start and end of cma_iw_handler().
> 
> [1]
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0
> Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9
> 
> CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x57/0x90
>  print_report+0xcf/0x660
>  ? __mutex_lock+0x1324/0x18f0
>  kasan_report+0xa4/0xe0
>  ? __mutex_lock+0x1324/0x18f0
>  __mutex_lock+0x1324/0x18f0
>  ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
>  ? _raw_spin_unlock_irqrestore+0x30/0x60
>  ? rcu_is_watching+0x11/0xb0
>  ? _raw_spin_unlock_irqrestore+0x30/0x60
>  ? trace_hardirqs_on+0x12/0x100
>  ? __pfx___mutex_lock+0x10/0x10
>  ? __percpu_counter_sum+0x147/0x1e0
>  ? domain_dirty_limits+0x246/0x390
>  ? wb_over_bg_thresh+0x4d5/0x610
>  ? rcu_is_watching+0x11/0xb0
>  ? cma_iw_handler+0xac/0x4f0 [rdma_cm]
>  cma_iw_handler+0xac/0x4f0 [rdma_cm]

What is the full call chain here, eg with the static functions
un-inlined?
> 
>  drivers/infiniband/core/cma.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 93a1c48d0c32..c5267d9bb184 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
>  	struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr;
>  	struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr;
>  
> +	cma_id_get(id_priv);
>  	mutex_lock(&id_priv->handler_mutex);
>  	if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT)
>  		goto out;
> @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
>  	if (ret) {
>  		/* Destroy the CM ID by returning a non-zero value. */
>  		id_priv->cm_id.iw = NULL;
> +		cma_id_put(id_priv);
>  		destroy_id_handler_unlock(id_priv);
>  		return ret;
>  	}
>  
>  out:
>  	mutex_unlock(&id_priv->handler_mutex);
> +	cma_id_put(id_priv);
>  	return ret;
>  }

cm_work_handler already has a ref on the iwcm_id_private

I think there is likely some much larger issue with the IW CM if the
cm_id can be destroyed while the iwcm_id is in use? It is weird that
there are two id memories for this :\

Jason