[PATCH] configurable NVME IO queue count

Thu Dec 21 01:22:34 PST 2023

Hi Keith,

Thanks for your time and comment.

At present, we have below parameters at PCI level which are related to queue configuration. 
  io_queue_depth:set io queue depth, should >= 2
  write_queues:Number of queues to use for writes. If not set, reads and writes will share a queue set.
  poll_queues:Number of queues to use for polled IO.

So added below parameter to control number of IO queue as well.
  total_io_queues:Restrict total number of queues to use for IO.

If system configuration is defined with fixed NVME device and Controller, but there is no configuration to control number of IO queue. 
So we can restrict number of IO queue from driver if configured like write queue or poll queue. 

Thanks
Sridhar
-----Original Message-----
From: Keith Busch <kbusch at kernel.org> 
Sent: Thursday, December 21, 2023 12:58 AM
To: Sridhar Balaraman <sbalaraman at parallelwireless.com>
Cc: linux-nvme at lists.infradead.org
Subject: Re: [PATCH] configurable NVME IO queue count

On Wed, Dec 20, 2023 at 05:53:16PM +0000, Sridhar Balaraman wrote:
> Some NVME products are providing more number of IO queues, so driver 
> is allocating queue based on number of CPU or maximum number of queues 
> by device, whichever is least.
> At the same time, corresponding queues would be mapped to IRQs.
> But in case of RT Linux, where we need isolated CPU and free from IRQ.
> Unfortunately NVME IRQ couldn't be reduced due to more number of CPUs.
> After making IO queues as configurable parameters, we can reduce 
> number of IO queues which inturn reduces number of IRQs.
> 
> Before configuring queue count:
> [    4.384566] nvme nvme0: pci function 0000:05:00.0
> [    4.450307] nvme nvme0: 32/0/0 default/read/poll queues
> [    4.457427]  nvme0n1: p1 p2
> 
> After configuring queue count to 2:
> [    5.920776] nvme nvme0: pci function 0000:05:00.0
> [    5.939455] nvme nvme0: 2/0/0 default/read/poll queues
> [    5.941487]  nvme0n1: p1 p2

This sounds okay to me. It aligns with existing io queue parameters and it's not unreasonable to limit driver resources. I think this should be enforced in nvme_set_queue_count() with a nvme_core module parameter instead of a PCIe specific one, though.

The effective nvme irq affinity will spread out to the unused CPUs with each additional nvme device. And maybe that's okay, but just wanted to mention that this patch can't guarantee nvme irq handlers won't be affined to every CPU if you have a sufficient number of nvme pcie controllers in the system.