[bug report] blktests nvme/061 hang with rdma transport and siw driver
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Sat May 10 02:59:40 PDT 2025
On May 09, 2025 / 12:21, Zhu Yanjun wrote:
> On 08.05.25 09:03, Shinichiro Kawasaki wrote:
> > On Apr 16, 2025 / 12:42, Shin'ichiro Kawasaki wrote:
> > > On Apr 15, 2025 / 15:18, Bernard Metzler wrote:
> > [...]
> > > > At first glance, to me it looks like a problem in the iwcm code,
> > > > where a cmid's work queue handling might be broken.
> >
> > I agree. The BUG slab-use-after-free happened for a work object. The call
> > trace indicates that happened for the work handled by iw_cm_wq, not
> > siw_cm_wq.
> >
> > I took a close looks, and I think the work objects allocated for each cm_id
> > is freed too early. The work objects are freed in dealloc_work_entries() when
> > all references to the cm_id are removed. IIUC, when the last reference to the
> > cm_id is removed in the work, the work object for the work itself gets removed.
> > Hence the use-after-free.
> >
> > Based on this guess, I created a fix trial patch below. It delays the reference
> > removal in the cm_id destroy context, to ensure that the reference count becomes
> > zeor not in the work contexts but in the cm_id destroy context. It moves
> > iwcm_deref_id() call from destroy_cm_id() to its callers. Also call
> > iwcm_deref_id() after flushing the pending works. With this patch, I observed
> > use-after-free goes away. Comments on the fix trial patch will be welcomed.
>
> It seems that this problem is related with the following commit.
>
> commit aee2424246f9f1dadc33faa78990c1e2eb7826e4
> Author: Bart Van Assche <bvanassche at acm.org>
> Date: Wed Jun 5 08:51:01 2024 -0600
>
> RDMA/iwcm: Fix a use-after-free related to destroying CM IDs
>
> iw_conn_req_handler() associates a new struct rdma_id_private (conn_id)
> with
> an existing struct iw_cm_id (cm_id) as follows:
>
> conn_id->cm_id.iw = cm_id;
> cm_id->context = conn_id;
> cm_id->cm_handler = cma_iw_handler;
>
> rdma_destroy_id() frees both the cm_id and the struct rdma_id_private.
> Make
> sure that cm_work_handler() does not trigger a use-after-free by only
> freeing of the struct rdma_id_private after all pending work has
> finished.
>
> Cc: stable at vger.kernel.org
> Fixes: 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref")
Yes, I agree that this commit is relevant. IIUC, this commit addressed the use-
after-free of the struct rdma_id_private, but it still left the use-after-free
of the work objects. I will mention this commit in the commit log of the formal
patch to post.
More information about the Linux-nvme
mailing list