[PATCH 1/1] nvme-rdma: destroy cm id before destroy qp to avoid use after free

liruozhu liruozhu at huawei.com
Fri Sep 10 00:42:06 PDT 2021


Hi Max,

I feel I may have misunderstood what you mean.I guess you want to know 
what tests I have done for patched code, not original one.

Currently I have tested the regression scenes, as mentioned in the 
previous email.

I will test all the scenes you are interested in and then feedback to you.

Thanks.
On 2021/9/10 10:50, liruozhu wrote:
> Hi Max,
>
> Yes, I did encounter this problem. I posted the detailed log in the 
> cover letter of this patch.
>
> I tested it with Huawei storage array, kill nvme target processes(we 
> use self-developed user mode nvme target)
> while inject PCIE AER errors on array HBA card repeatedly.
>
> I’m not sure if there is a way to easily reproduce it using standard 
> server with kernel target.
> But I still think that the host side should try to avoid this 
> possibility of crashing.
>
> Thanks.
>
> On 2021/9/10 1:32, Max Gurtovoy wrote:
>>
>> On 9/9/2021 5:05 PM, Christoph Hellwig wrote:
>>> This looks reasonable to me.  Sagi, Max: any comments?
>>>
>>> On Mon, Sep 06, 2021 at 11:51:34AM +0800, Ruozhu Li wrote:
>>>> We should always destroy cm_id before destroy qp to avoid to get cma
>>>> event after qp was destroyed, which may lead to use after free.
>>
>> Do you really encounter use-after-free ?
>>
>> I think the code looks correct but from experience with RDMA-CM I 
>> would like to know which tests did you run with this code ?
>>
>> interesting tests are: port toggling with/without IO, unload 
>> mlx5/other_rdma_hw drivers during connection establishment or during 
>> IO, reboot target, reboot target during connection establishment, 
>> reboot target during IO.
>>
>> Thanks.
>>
>>
>>>> In RDMA connection establishment error flow, don't destroy qp in cm
>>>> event handler.Just report cm_error to upper level, qp will be destroy
>>>> in nvme_rdma_alloc_queue() after destroy cm id.
>>>>
>>>> Signed-off-by: Ruozhu Li <liruozhu at huawei.com>
>>>> ---
>>>>   drivers/nvme/host/rdma.c | 16 +++-------------
>>>>   1 file changed, 3 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index a68704e39084..042c594bc57e 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -656,8 +656,8 @@ static void nvme_rdma_free_queue(struct 
>>>> nvme_rdma_queue *queue)
>>>>       if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>>>>           return;
>>>>   -    nvme_rdma_destroy_queue_ib(queue);
>>>>       rdma_destroy_id(queue->cm_id);
>>>> +    nvme_rdma_destroy_queue_ib(queue);
>>>>       mutex_destroy(&queue->queue_lock);
>>>>   }
>>>>   @@ -1815,14 +1815,10 @@ static int 
>>>> nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
>>>>       for (i = 0; i < queue->queue_size; i++) {
>>>>           ret = nvme_rdma_post_recv(queue, &queue->rsp_ring[i]);
>>>>           if (ret)
>>>> -            goto out_destroy_queue_ib;
>>>> +            return ret;
>>>>       }
>>>>         return 0;
>>>> -
>>>> -out_destroy_queue_ib:
>>>> -    nvme_rdma_destroy_queue_ib(queue);
>>>> -    return ret;
>>>>   }
>>>>     static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,
>>>> @@ -1916,14 +1912,10 @@ static int nvme_rdma_route_resolved(struct 
>>>> nvme_rdma_queue *queue)
>>>>       if (ret) {
>>>>           dev_err(ctrl->ctrl.device,
>>>>               "rdma_connect_locked failed (%d).\n", ret);
>>>> -        goto out_destroy_queue_ib;
>>>> +        return ret;
>>>>       }
>>>>         return 0;
>>>> -
>>>> -out_destroy_queue_ib:
>>>> -    nvme_rdma_destroy_queue_ib(queue);
>>>> -    return ret;
>>>>   }
>>>>     static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>>>> @@ -1954,8 +1946,6 @@ static int nvme_rdma_cm_handler(struct 
>>>> rdma_cm_id *cm_id,
>>>>       case RDMA_CM_EVENT_ROUTE_ERROR:
>>>>       case RDMA_CM_EVENT_CONNECT_ERROR:
>>>>       case RDMA_CM_EVENT_UNREACHABLE:
>>>> -        nvme_rdma_destroy_queue_ib(queue);
>>>> -        fallthrough;
>>>>       case RDMA_CM_EVENT_ADDR_ERROR:
>>>>           dev_dbg(queue->ctrl->ctrl.device,
>>>>               "CM error event %d\n", ev->event);
>>>> -- 
>>>> 2.16.4
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-nvme mailing list
>>>> Linux-nvme at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>> ---end quoted text---
>> .
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme



More information about the Linux-nvme mailing list