[PATCH rfc] rdma/verbs: fix a possible uaf when draining a srq attached qp

Thu Jun 6 04:30:34 PDT 2024

On 06/06/2024 13:13, Sagi Grimberg wrote:
>
>
> On 02/06/2024 16:08, Sagi Grimberg wrote:
>>
>>
>> On 02/06/2024 14:43, Max Gurtovoy wrote:
>>> Hi Sagi,
>>>
>>> On 02/06/2024 11:53, Sagi Grimberg wrote:
>>>>
>>>>
>>>> On 02/06/2024 11:19, Leon Romanovsky wrote:
>>>>> On Sun, May 26, 2024 at 11:31:25AM +0300, Sagi Grimberg wrote:
>>>>>> ib_drain_qp does not do drain a shared recv queue (because it is
>>>>>> shared). However in the absence of any guarantees that the recv
>>>>>> completions were consumed, the ulp can reference these completions
>>>>>> after draining the qp and freeing its associated resources, which
>>>>>> is a uaf [1].
>>>>>>
>>>>>> We cannot drain a srq like a normal rq, however in ib_drain_qp
>>>>>> once the qp moved to error state, we reap the recv_cq once in
>>>>>> order to prevent consumption of recv completions after the drain.
>>>>>>
>>>>>> [1]:
>>>>>> -- 
>>>>>> [199856.569999] Unable to handle kernel paging request at virtual 
>>>>>> address 002248778adfd6d0
>>>>>> <....>
>>>>>> [199856.721701] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
>>>>>> <....>
>>>>>> [199856.827281] Call trace:
>>>>>> [199856.829847]  nvmet_parse_admin_cmd+0x34/0x178 [nvmet]
>>>>>> [199856.835007]  nvmet_req_init+0x2e0/0x378 [nvmet]
>>>>>> [199856.839640]  nvmet_rdma_handle_command+0xa4/0x2e8 [nvmet_rdma]
>>>>>> [199856.845575]  nvmet_rdma_recv_done+0xcc/0x240 [nvmet_rdma]
>>>>>> [199856.851109]  __ib_process_cq+0x84/0xf0 [ib_core]
>>>>>> [199856.855858]  ib_cq_poll_work+0x34/0xa0 [ib_core]
>>>>>> [199856.860587]  process_one_work+0x1ec/0x4a0
>>>>>> [199856.864694]  worker_thread+0x48/0x490
>>>>>> [199856.868453]  kthread+0x158/0x160
>>>>>> [199856.871779]  ret_from_fork+0x10/0x18
>>>>>> -- 
>>>>>>
>>>>>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>>>>>> ---
>>>>>> Note this patch is not yet tested, but sending it for visibility and
>>>>>> early feedback. While nothing prevents ib_drain_cq to process a cq
>>>>>> directly (even if it has another context) I am not convinced if all
>>>>>> the upper layers don't have any assumptions about a single context
>>>>>> consuming the cq, even if it is while it is drained. It is also
>>>>>> possible to to add ib_reap_cq that fences the cq poll context before
>>>>>> reaping the cq, but this may have other side-effects.
>>>>> Did you have a chance to test this patch?
>>>>> I looked at the code and it seems to be correct change, but I also 
>>>>> don't
>>>>> know about all ULP assumptions.
>>>>
>>>> Not yet...
>>>>
>>>> One thing that is problematic with this patch though is that there 
>>>> is no
>>>> stopping condition to the direct poll. So if the CQ is shared among 
>>>> a number of
>>>> qps (and srq's), nothing prevents the polling from consume 
>>>> completions forever...
>>>>
>>>> So we probably need it to be:
>>>> -- 
>>>> diff --git a/drivers/infiniband/core/verbs.c 
>>>> b/drivers/infiniband/core/verbs.c
>>>> index 580e9019e96a..f411fef35938 100644
>>>> --- a/drivers/infiniband/core/verbs.c
>>>> +++ b/drivers/infiniband/core/verbs.c
>>>> @@ -2971,7 +2971,7 @@ void ib_drain_qp(struct ib_qp *qp)
>>>>                  * guarantees that the ulp will free resources and 
>>>> only then
>>>>                  * consume the recv completion.
>>>>                  */
>>>> -               ib_process_cq_direct(qp->recv_cq, -1);
>>>> +               ib_process_cq_direct(qp->recv_cq, qp->recv_cq->cqe);
>>>
>>> I tried to fix the problem few years ago in [1] but eventually the 
>>> patchset was not accepted.
>>>
>>> We should probably try to improve the original series (using cq->cqe 
>>> instead of -1 for example) and upstream it.
>>>
>>> [1] 
>>> https://lore.kernel.org/all/1516197178-26493-1-git-send-email-maxg@mellanox.com/
>>
>> Yes. I'd like to know if the qp event is needed though, given that 
>> the qp is already in error state (from the sq drain)
>> and then we run a single pass over the recv_cq, is it possible to 
>> miss any completions.
>>
>> Would like to resolve the drain without involving the ulp drivers.
>>
>> If so, then perhaps we can intercept the qp event at the core, before 
>> sending it to the ulp and there we can complete
>> the srq completion.
>
> Ping?

I think that catching the IB_EVENT_QP_LAST_WQE_REACHED in the core and 
complete(&qp->srq_completion) is possible but will require some event 
processing in the ib core.

Probably my patch [1/4] from 
https://lore.kernel.org/all/1516197178-26493-2-git-send-email-maxg@mellanox.com/#r 
should be updated with this logic.