[RFC PATCH 1/4] nvme-tcp: optionally limit I/O queue count based on NIC queues
Nilay Shroff
nilay at linux.ibm.com
Mon Apr 27 00:37:55 PDT 2026
On 4/24/26 7:16 PM, Christoph Hellwig wrote:
>> In such configurations, limiting the number of NVMe-TCP I/O queues to
>> the number of NIC hardware queues can improve performance by reducing
>> contention and improving locality. Aligning NVMe-TCP worker threads with
>> NIC queue topology may also help reduce tail latency.
>
> Yes, this sounds useful.
>
>>
>> Add a new transport option "match_hw_queues" to allow users to
>> optionally limit the number of NVMe-TCP I/O queues to the number of NIC
>> TX/RX queues. When enabled, the number of I/O queues is set to:
>>
>> min(num_online_cpus, num_nic_queues)
>>
>> This behavior is opt-in and does not change existing defaults.
>
> Any good reason for that? For PCI and RDMA we try to do the right
> thing by default.
>
The only reason was that in certain complex typologies it may not
be really possible (for instance, QEMU) to get the real num of tx/rx
queues. In such situation, I thought we're better off using this
feature and hence I added the opt-in. But yes I'd also love to remove
this option and find a better way to detect such cases where we can't get
the real num of tx/rx queues and thus aromatically fallback to creating
as many I/O queues as num of online cpus. I'd explore this and see
if that's possible.
>> +static struct net_device *nvme_tcp_get_netdev(struct nvme_ctrl *ctrl)
>> +{
>> + struct net_device *dev = NULL;
>> +
>> + if (ctrl->opts->mask & NVMF_OPT_HOST_IFACE)
>> + dev = dev_get_by_name(&init_net, ctrl->opts->host_iface);
>
> Return early here instead of the giant indentation for the new options.
>
Yes okay, makes sense!
>> + else {
>> + struct nvme_tcp_ctrl *tctrl = to_tcp_ctrl(ctrl);
>> +
>> + if (tctrl->addr.ss_family == AF_INET) {
>
> And then split each address family into a helper. And to me those
> look like something that should be in net/.
>
Hmm okay, I think if we want to add these helpers under net/ then it should be
in include/net/route.h and include/net/ip6_route.h for IPv4 and IPv6 respectively.
>> +
>> +/*
>> + * Returns number of active NIC queues (min of TX/RX), or 0 if device cannot
>> + * be determined.
>> + */
>> +static int nvme_tcp_get_netdev_current_queue_count(struct nvme_ctrl *ctrl)
>
> drop _current to make this a bit more readable?
>
Sure.
>> @@ -2144,6 +2243,24 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>> unsigned int nr_io_queues;
>> int ret;
>>
>> + if (!(ctrl->opts->mask & NVMF_OPT_NR_IO_QUEUES) &&
>> + (ctrl->opts->mask & NVMF_OPT_MATCH_HW_QUEUES)) {
>
> The more readable formatting would be:
>
> if (!(ctrl->opts->mask & NVMF_OPT_NR_IO_QUEUES) &&
> (ctrl->opts->mask & NVMF_OPT_MATCH_HW_QUEUES)) {
>
Yep, I will change this.
>> + int nr_hw_queues;
>> +
>> + nr_hw_queues = nvme_tcp_get_netdev_current_queue_count(ctrl);
>> + if (nr_hw_queues <= 0)
>> + goto init_queue;
>> +
>> + ctrl->opts->nr_io_queues = min(nr_hw_queues, num_online_cpus());
>> +
>> + if (ctrl->opts->nr_io_queues < num_online_cpus())
>> + dev_info(ctrl->device,
>> + "limiting I/O queues to %u (NIC queues %d, CPUs %u)\n",
>> + ctrl->opts->nr_io_queues, nr_hw_queues,
>> + num_online_cpus());
>> + }
>
> And splitting this into a helper would help keeping the flow sane.
>
Alright, will make it into separate helper.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list