[PATCH] configurable NVME IO queue count

Wed Dec 20 11:27:31 PST 2023

On Wed, Dec 20, 2023 at 05:53:16PM +0000, Sridhar Balaraman wrote:
> Some NVME products are providing more number of IO queues,
> so driver is allocating queue based on number of CPU or
> maximum number of queues by device, whichever is least.
> At the same time, corresponding queues would be mapped to IRQs.
> But in case of RT Linux, where we need isolated CPU and free from IRQ.
> Unfortunately NVME IRQ couldn't be reduced due to more number of CPUs.
> After making IO queues as configurable parameters,
> we can reduce number of IO queues which inturn reduces number of IRQs.
> 
> Before configuring queue count:
> [    4.384566] nvme nvme0: pci function 0000:05:00.0
> [    4.450307] nvme nvme0: 32/0/0 default/read/poll queues
> [    4.457427]  nvme0n1: p1 p2
> 
> After configuring queue count to 2:
> [    5.920776] nvme nvme0: pci function 0000:05:00.0
> [    5.939455] nvme nvme0: 2/0/0 default/read/poll queues
> [    5.941487]  nvme0n1: p1 p2

This sounds okay to me. It aligns with existing io queue parameters and
it's not unreasonable to limit driver resources. I think this should be
enforced in nvme_set_queue_count() with a nvme_core module parameter
instead of a PCIe specific one, though.

The effective nvme irq affinity will spread out to the unused CPUs with
each additional nvme device. And maybe that's okay, but just wanted to
mention that this patch can't guarantee nvme irq handlers won't be
affined to every CPU if you have a sufficient number of nvme pcie
controllers in the system.