[PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation

Thu Nov 9 03:02:49 PST 2017

> In case nvme_rdma_wait_for_cm timeout expires before we get
> an established or rejected event (rdma_connect succeeded) from
> rdma_cm, we end up with leaking the ib resources for dedicated
> queue.
> This scenario can easily reproduced using traffic test during port
> toggling.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0ebb539..fcb278a 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
>   	if (ret) {
>   		dev_info(ctrl->ctrl.device,
>   			"rdma_resolve_addr wait failed (%d).\n", ret);

Are you rebased? this message have changed I think.

> -		goto out_destroy_cm_id;
> +		goto out_destroy_queue_ib;
>   	}
>   
>   	clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>   
>   	return 0;
>   
> +out_destroy_queue_ib:
> +	if (ret == -ETIMEDOUT)
> +		nvme_rdma_destroy_queue_ib(queue);

This does not look safe to me. What protects that nvme_rdma_cm_handler
will not destroy the ib queue as well? I think we need to destroy the
cm_id first (guarantee that we will never handle other cma events)
and only then destroy the ib queue if needed.