[PATCH 1/1] nvme-rdma: destroy cm id before destroy qp to avoid use after free
Max Gurtovoy
mgurtovoy at nvidia.com
Mon Sep 13 07:16:40 PDT 2021
On 9/13/2021 4:49 AM, liruozhu wrote:
> Hi all,
>
> I tested the test scenario mentioned by max, and I haven't found any
> abnormality in the host.
>
> If there are other interesting scenes or suggestions for patch, please
> let me know.
>
> Thanks.
Thanks for the update.
The code looks good,
Reviewed-by: Max Gurtovoy <mgurtovoy at nvidia.com>
>
> On 2021/9/10 1:32, Max Gurtovoy wrote:
>>
>> On 9/9/2021 5:05 PM, Christoph Hellwig wrote:
>>> This looks reasonable to me. Sagi, Max: any comments?
>>>
>>> On Mon, Sep 06, 2021 at 11:51:34AM +0800, Ruozhu Li wrote:
>>>> We should always destroy cm_id before destroy qp to avoid to get cma
>>>> event after qp was destroyed, which may lead to use after free.
>>
>> Do you really encounter use-after-free ?
>>
>> I think the code looks correct but from experience with RDMA-CM I
>> would like to know which tests did you run with this code ?
>>
>> interesting tests are: port toggling with/without IO, unload
>> mlx5/other_rdma_hw drivers during connection establishment or during
>> IO, reboot target, reboot target during connection establishment,
>> reboot target during IO.
>>
>> Thanks.
>>
>>
>>>> In RDMA connection establishment error flow, don't destroy qp in cm
>>>> event handler.Just report cm_error to upper level, qp will be destroy
>>>> in nvme_rdma_alloc_queue() after destroy cm id.
>>>>
>>>> Signed-off-by: Ruozhu Li <liruozhu at huawei.com>
>>>> ---
>>>> drivers/nvme/host/rdma.c | 16 +++-------------
>>>> 1 file changed, 3 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>>> index a68704e39084..042c594bc57e 100644
>>>> --- a/drivers/nvme/host/rdma.c
>>>> +++ b/drivers/nvme/host/rdma.c
>>>> @@ -656,8 +656,8 @@ static void nvme_rdma_free_queue(struct
>>>> nvme_rdma_queue *queue)
>>>> if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>>>> return;
>>>> - nvme_rdma_destroy_queue_ib(queue);
>>>> rdma_destroy_id(queue->cm_id);
>>>> + nvme_rdma_destroy_queue_ib(queue);
>>>> mutex_destroy(&queue->queue_lock);
>>>> }
>>>> @@ -1815,14 +1815,10 @@ static int
>>>> nvme_rdma_conn_established(struct nvme_rdma_queue *queue)
>>>> for (i = 0; i < queue->queue_size; i++) {
>>>> ret = nvme_rdma_post_recv(queue, &queue->rsp_ring[i]);
>>>> if (ret)
>>>> - goto out_destroy_queue_ib;
>>>> + return ret;
>>>> }
>>>> return 0;
>>>> -
>>>> -out_destroy_queue_ib:
>>>> - nvme_rdma_destroy_queue_ib(queue);
>>>> - return ret;
>>>> }
>>>> static int nvme_rdma_conn_rejected(struct nvme_rdma_queue *queue,
>>>> @@ -1916,14 +1912,10 @@ static int nvme_rdma_route_resolved(struct
>>>> nvme_rdma_queue *queue)
>>>> if (ret) {
>>>> dev_err(ctrl->ctrl.device,
>>>> "rdma_connect_locked failed (%d).\n", ret);
>>>> - goto out_destroy_queue_ib;
>>>> + return ret;
>>>> }
>>>> return 0;
>>>> -
>>>> -out_destroy_queue_ib:
>>>> - nvme_rdma_destroy_queue_ib(queue);
>>>> - return ret;
>>>> }
>>>> static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>>>> @@ -1954,8 +1946,6 @@ static int nvme_rdma_cm_handler(struct
>>>> rdma_cm_id *cm_id,
>>>> case RDMA_CM_EVENT_ROUTE_ERROR:
>>>> case RDMA_CM_EVENT_CONNECT_ERROR:
>>>> case RDMA_CM_EVENT_UNREACHABLE:
>>>> - nvme_rdma_destroy_queue_ib(queue);
>>>> - fallthrough;
>>>> case RDMA_CM_EVENT_ADDR_ERROR:
>>>> dev_dbg(queue->ctrl->ctrl.device,
>>>> "CM error event %d\n", ev->event);
>>>> --
>>>> 2.16.4
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-nvme mailing list
>>>> Linux-nvme at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>> ---end quoted text---
>> .
More information about the Linux-nvme
mailing list