[RFC PATCH V2 2/2] nvme: rdma: use ib_device's max_qp_wr to limit sqsize

Guixin Liu kanie at linux.alibaba.com
Thu Dec 21 22:58:58 PST 2023


在 2023/12/21 03:27, Sagi Grimberg 写道:
>
>>>> @@ -1030,11 +1030,13 @@ static int nvme_rdma_setup_ctrl(struct 
>>>> nvme_rdma_ctrl *ctrl, bool new)
>>>>               ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1);
>>>>       }
>>>> -    if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) {
>>>> +    ib_max_qsize = ctrl->device->dev->attrs.max_qp_wr /
>>>> +            (NVME_RDMA_SEND_WR_FACTOR + 1);
>>>
>>> rdma_dev_max_qsize is a better name.
>>>
>>> Also, you can drop the RFC for the next submission.
>>>
>>
>> Sagi,
>> I don't feel comfortable with these patches.
>
> Well, good that you're speaking up then ;)
>
>> First I would like to understand the need for it.
>
> I assumed that he stumbled on a device that did not support the
> existing max of 128 nvme commands (which is 384 rdma wrs for the qp).
>
The situation is that I need a queue depth greater than 128.
>> Second, the QP WR can be constructed from one or more WQEs and the 
>> WQEs can be constructed from one or more WQEBBs. The max_qp_wr 
>> doesn't take it into account.
>
> Well, it is not taken into account now either with the existing magic
> limit in nvmet. The rdma limits reporting mechanism was and still is
> unusable.
>
> I would expect a device that has different size for different work
> items to report max_qp_wr accounting for the largest work element that
> the device supports, so it is universally correct.
>
> The fact that max_qp_wr means the maximum number of slots is a qp and
> at the same time different work requests can arbitrarily use any number
> of slots without anyone ever knowing, makes it pretty much impossible to
> use reliably.
>
> Maybe rdma device attributes need a new attribute called
> universal_max_qp_wr that is going to actually be reliable and not
> guess-work?

I see, the max_qp_wr is not as reliable as I imagined. Is there any 
another way to get a queue depth grater than 128

instead of changing NVME_RDMA_MAX_QUEUE_SIZE?




More information about the Linux-nvme mailing list