[bug report] blktests nvme/061 hang with rdma transport and siw driver
Bernard Metzler
BMT at zurich.ibm.com
Thu May 8 07:22:53 PDT 2025
> -----Original Message-----
> From: Shinichiro Kawasaki <shinichiro.kawasaki at wdc.com>
> Sent: Thursday, May 8, 2025 9:03 AM
> To: Bernard Metzler <BMT at zurich.ibm.com>
> Cc: linux-nvme at lists.infradead.org; linux-rdma at vger.kernel.org; Daniel
> Wagner <wagi at kernel.org>
> Subject: [EXTERNAL] Re: [bug report] blktests nvme/061 hang with rdma
> transport and siw driver
>
> On Apr 16, 2025 / 12:42, Shin'ichiro Kawasaki wrote:
> > On Apr 15, 2025 / 15:18, Bernard Metzler wrote:
> [...]
> > > At first glance, to me it looks like a problem in the iwcm code,
> > > where a cmid's work queue handling might be broken.
>
Shinichiro, many thanks for taking care of this.
> I agree. The BUG slab-use-after-free happened for a work object. The call
> trace indicates that happened for the work handled by iw_cm_wq, not
> siw_cm_wq.
>
> I took a close looks, and I think the work objects allocated for each cm_id
> is freed too early. The work objects are freed in dealloc_work_entries()
> when
> all references to the cm_id are removed. IIUC, when the last reference to
> the
> cm_id is removed in the work, the work object for the work itself gets
> removed.
> Hence the use-after-free.
>
> Based on this guess, I created a fix trial patch below. It delays the
> reference
> removal in the cm_id destroy context, to ensure that the reference count
> becomes
> zeor not in the work contexts but in the cm_id destroy context. It moves
> iwcm_deref_id() call from destroy_cm_id() to its callers. Also call
> iwcm_deref_id() after flushing the pending works. With this patch, I
> observed
> use-after-free goes away. Comments on the fix trial patch will be welcomed.
>
I agree with the outcome of your investigation. This patch looks
good to me.
> One left question is why the failure was not observed with rxe driver, but
> with
> siw driver. My mere guess is that is because siw driver calls id->add_ref()
> and
> cm_id->rem_ref().
>
Yes, siw consistently bumps up the reference counter of the cm_id it gets
in the listen()/accept()/connect CM downcalls, since it creates a local
reference to it from its connection endpoint. It gets only decremented when
this reference goes away.
Bernard.
>
> diff --git a/drivers/infiniband/core/iwcm.c
> b/drivers/infiniband/core/iwcm.c
> index f4486cbd8f45..600cf8ea6a39 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -368,12 +368,9 @@ EXPORT_SYMBOL(iw_cm_disconnect);
> /*
> * CM_ID <-- DESTROYING
> *
> - * Clean up all resources associated with the connection and release
> - * the initial reference taken by iw_create_cm_id.
> - *
> - * Returns true if and only if the last cm_id_priv reference has been
> dropped.
> + * Clean up all resources associated with the connection.
> */
> -static bool destroy_cm_id(struct iw_cm_id *cm_id)
> +static void destroy_cm_id(struct iw_cm_id *cm_id)
> {
> struct iwcm_id_private *cm_id_priv;
> struct ib_qp *qp;
> @@ -442,20 +439,22 @@ static bool destroy_cm_id(struct iw_cm_id *cm_id)
> iwpm_remove_mapinfo(&cm_id->local_addr, &cm_id->m_local_addr);
> iwpm_remove_mapping(&cm_id->local_addr, RDMA_NL_IWCM);
> }
> -
> - return iwcm_deref_id(cm_id_priv);
> }
>
> /*
> * This function is only called by the application thread and cannot
> * be called by the event thread. The function will wait for all
> - * references to be released on the cm_id and then kfree the cm_id
> - * object.
> + * references to be released on the cm_id and then release the initial
> + * reference taken by iw_create_cm_id.
> */
> void iw_destroy_cm_id(struct iw_cm_id *cm_id)
> {
> - if (!destroy_cm_id(cm_id))
> - flush_workqueue(iwcm_wq);
> + struct iwcm_id_private *cm_id_priv;
> +
> + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> + destroy_cm_id(cm_id);
> + flush_workqueue(iwcm_wq);
> + iwcm_deref_id(cm_id_priv);
> }
> EXPORT_SYMBOL(iw_destroy_cm_id);
>
> @@ -1035,8 +1034,10 @@ static void cm_work_handler(struct work_struct
> *_work)
>
> if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) {
> ret = process_event(cm_id_priv, &levent);
> - if (ret)
> - WARN_ON_ONCE(destroy_cm_id(&cm_id_priv->id));
> + if (ret) {
> + destroy_cm_id(&cm_id_priv->id);
> + WARN_ON_ONCE(iwcm_deref_id(cm_id_priv));
> + }
> } else
> pr_debug("dropping event %d\n", levent.event);
> if (iwcm_deref_id(cm_id_priv))
More information about the Linux-nvme
mailing list