[PATCH] nvmet: Limit num of queues to num of cpus on traget
Max Gurtovoy
maxg at mellanox.com
Thu Dec 7 02:51:58 PST 2017
On 12/7/2017 8:51 AM, Sagi Grimberg wrote:
>
>>>> Sagi/Christoph,
>>>> In case the NVMEoF host asks num_io_queues << num_target_cpus there is
>>>> a waste of memory allocated in the target side for the sq's/cq's that
>>>> we never use. We might want to check the number of needed queues in
>>>> "set_features" cmd and allocate the minimum(wanted, target_capable).
>>>> This will block the option of creating more queues in a later stage
>>>> of the host lifecycle, but saves resources.
>>>>
>>>> thoughts ?
>>>
>>> This is not the way to go at all. If you really care about saving a few
>>> Kbytes of nvmet sq/cq and see that its a real problem, you need to
>>> allocate them on demand instead of posing this limit to the host.
>>>
>>
>> Linux host side sets "count = min(*count, nr_io_queues)" in
>> nvme_set_queue_count so there is no chance it will ask more queues in
>> the future. We don't really care of this scenario (wanted_queues <
>> target_capable_queues) but we do care about the case that this patch
>> comes to fix.
>
> Well, I do not necessarily agree with the statement that more queues
> than cpu cores is not needed in the target simply because its the host
> that wants to set the number of queues.
Ok, but in the pci case, the host can ask for example 1024 queues but
the drive support only 32 and will return 32. This is what we trying to
fix here. We had a "poor" target with 8 CPUs that allocated 128 sq/cq
and couldn't serve many initiators. When we allocated 8 sq/cq we serve
more initiators.
>
> If I understand you correctly, what needs to be fixed is the host
> settling for less queues than what it got in set_features (like
> how we handle it in pci).
The host (in rdma transport for example) actually using the core
function as the pci does.
The comment the Nitzan added was for the case that the target can
support more queues than requested (The attached patch doesn't support
this case). In pci, the device return the number of queues he *can* but
the actuall value we use is the minimum between the two. Other
implementations can say: oh, the pci can allocate more than I need so I
might create more queues later on and the device will support it. The
thought we had for the fabrics, is to check what the host needs (we know
it in set_features) and return (in set_features response) the minimum
between his needs and target capability (and not support a greedy host)
and then allocate the final amount of queues.
As I said, this is less critical then the case we allocate 128 queues in
a weak target.
hope it is clearer now :)
More information about the Linux-nvme
mailing list