[PATCH v2] RDMA/cma: prevent rdma id destroy during cma_iw_handler
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Wed Jun 14 00:53:49 PDT 2023
On Jun 13, 2023 / 21:07, Leon Romanovsky wrote:
> On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote:
> > > > I think there is likely some much larger issue with the IW CM if the
> > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that
> > > > there are two id memories for this :\
> > >
> > > My understanding about the call chain to rdma id destroy is as follows. I guess
> > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure
> > > why it does not wait for cm_id deref by cm_work_handler.
> > >
> > > nvme_rdma_teardown_io_queueus
> > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler
> > > nvme_rdma_free_io_queues
> > > nvme_rdma_free_queue
> > > rdma_destroy_id
> > > mutex_lock(&id_priv->handler_mutex)
> > > destroy_id_handler_unlock
> > > mutex_unlock(&id_priv->handler_mutex)
> > > _destory_id
> > > iw_destroy_cm_id
> > > wait_for_completiion(&id_priv->comp)
> > > kfree(id_priv)
> >
> > Once a destroy_cm_id() has returned that layer is no longer
> > permitted to run or be running in its handlers. The iw cm is broken if
> > it allows this, and that is the cause of the bug.
> >
> > Taking more refs within handlers that are already not allowed to be
> > running is just racy.
>
> So we need to revert that patch from our rdma-rc.
I see, thanks for the clarifications.
As another fix approach, I reverted the commit 59c68ac31e15 ("iw_cm: free cm_id
resources on the last deref") so that iw_destroy_cm_id() waits for deref of
cm_id. With that revert, the KASAN slab-use-after-free disappeared. Is this
the right fix approach?
More information about the Linux-nvme
mailing list