[PATCH rfc] rdma/verbs: fix a possible uaf when draining a srq attached qp
Sagi Grimberg
sagi at grimberg.me
Thu Jun 6 03:13:27 PDT 2024
On 02/06/2024 16:08, Sagi Grimberg wrote:
>
>
> On 02/06/2024 14:43, Max Gurtovoy wrote:
>> Hi Sagi,
>>
>> On 02/06/2024 11:53, Sagi Grimberg wrote:
>>>
>>>
>>> On 02/06/2024 11:19, Leon Romanovsky wrote:
>>>> On Sun, May 26, 2024 at 11:31:25AM +0300, Sagi Grimberg wrote:
>>>>> ib_drain_qp does not do drain a shared recv queue (because it is
>>>>> shared). However in the absence of any guarantees that the recv
>>>>> completions were consumed, the ulp can reference these completions
>>>>> after draining the qp and freeing its associated resources, which
>>>>> is a uaf [1].
>>>>>
>>>>> We cannot drain a srq like a normal rq, however in ib_drain_qp
>>>>> once the qp moved to error state, we reap the recv_cq once in
>>>>> order to prevent consumption of recv completions after the drain.
>>>>>
>>>>> [1]:
>>>>> --
>>>>> [199856.569999] Unable to handle kernel paging request at virtual
>>>>> address 002248778adfd6d0
>>>>> <....>
>>>>> [199856.721701] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
>>>>> <....>
>>>>> [199856.827281] Call trace:
>>>>> [199856.829847] nvmet_parse_admin_cmd+0x34/0x178 [nvmet]
>>>>> [199856.835007] nvmet_req_init+0x2e0/0x378 [nvmet]
>>>>> [199856.839640] nvmet_rdma_handle_command+0xa4/0x2e8 [nvmet_rdma]
>>>>> [199856.845575] nvmet_rdma_recv_done+0xcc/0x240 [nvmet_rdma]
>>>>> [199856.851109] __ib_process_cq+0x84/0xf0 [ib_core]
>>>>> [199856.855858] ib_cq_poll_work+0x34/0xa0 [ib_core]
>>>>> [199856.860587] process_one_work+0x1ec/0x4a0
>>>>> [199856.864694] worker_thread+0x48/0x490
>>>>> [199856.868453] kthread+0x158/0x160
>>>>> [199856.871779] ret_from_fork+0x10/0x18
>>>>> --
>>>>>
>>>>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>>>>> ---
>>>>> Note this patch is not yet tested, but sending it for visibility and
>>>>> early feedback. While nothing prevents ib_drain_cq to process a cq
>>>>> directly (even if it has another context) I am not convinced if all
>>>>> the upper layers don't have any assumptions about a single context
>>>>> consuming the cq, even if it is while it is drained. It is also
>>>>> possible to to add ib_reap_cq that fences the cq poll context before
>>>>> reaping the cq, but this may have other side-effects.
>>>> Did you have a chance to test this patch?
>>>> I looked at the code and it seems to be correct change, but I also
>>>> don't
>>>> know about all ULP assumptions.
>>>
>>> Not yet...
>>>
>>> One thing that is problematic with this patch though is that there
>>> is no
>>> stopping condition to the direct poll. So if the CQ is shared among
>>> a number of
>>> qps (and srq's), nothing prevents the polling from consume
>>> completions forever...
>>>
>>> So we probably need it to be:
>>> --
>>> diff --git a/drivers/infiniband/core/verbs.c
>>> b/drivers/infiniband/core/verbs.c
>>> index 580e9019e96a..f411fef35938 100644
>>> --- a/drivers/infiniband/core/verbs.c
>>> +++ b/drivers/infiniband/core/verbs.c
>>> @@ -2971,7 +2971,7 @@ void ib_drain_qp(struct ib_qp *qp)
>>> * guarantees that the ulp will free resources and
>>> only then
>>> * consume the recv completion.
>>> */
>>> - ib_process_cq_direct(qp->recv_cq, -1);
>>> + ib_process_cq_direct(qp->recv_cq, qp->recv_cq->cqe);
>>
>> I tried to fix the problem few years ago in [1] but eventually the
>> patchset was not accepted.
>>
>> We should probably try to improve the original series (using cq->cqe
>> instead of -1 for example) and upstream it.
>>
>> [1]
>> https://lore.kernel.org/all/1516197178-26493-1-git-send-email-maxg@mellanox.com/
>
> Yes. I'd like to know if the qp event is needed though, given that the
> qp is already in error state (from the sq drain)
> and then we run a single pass over the recv_cq, is it possible to miss
> any completions.
>
> Would like to resolve the drain without involving the ulp drivers.
>
> If so, then perhaps we can intercept the qp event at the core, before
> sending it to the ulp and there we can complete
> the srq completion.
Ping?
More information about the Linux-nvme
mailing list