[PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation

Max Gurtovoy maxg at mellanox.com
Thu Nov 9 03:09:10 PST 2017



On 11/9/2017 1:02 PM, Sagi Grimberg wrote:
> 
>> In case nvme_rdma_wait_for_cm timeout expires before we get
>> an established or rejected event (rdma_connect succeeded) from
>> rdma_cm, we end up with leaking the ib resources for dedicated
>> queue.
>> This scenario can easily reproduced using traffic test during port
>> toggling.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>>   drivers/nvme/host/rdma.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 0ebb539..fcb278a 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct 
>> nvme_rdma_ctrl *ctrl,
>>       if (ret) {
>>           dev_info(ctrl->ctrl.device,
>>               "rdma_resolve_addr wait failed (%d).\n", ret);
> 
> Are you rebased? this message have changed I think.

I'm working on the main master. Should I work on top of nvme-4.15 ? from 
what I saw few days ago, it's wasn't rebased on top of 4.14-rc8

> 
>> -        goto out_destroy_cm_id;
>> +        goto out_destroy_queue_ib;
>>       }
>>       clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>>       return 0;
>> +out_destroy_queue_ib:
>> +    if (ret == -ETIMEDOUT)
>> +        nvme_rdma_destroy_queue_ib(queue);
> 
> This does not look safe to me. What protects that nvme_rdma_cm_handler
> will not destroy the ib queue as well? I think we need to destroy the
> cm_id first (guarantee that we will never handle other cma events)
> and only then destroy the ib queue if needed.

You mean we need to destroy cm_id always before destroying ib queue ?



More information about the Linux-nvme mailing list