NVMe and IRQ Affinity, another problem

Wed Apr 4 19:48:51 PDT 2018

On Thu, Apr 05, 2018 at 02:31:21AM +0000, Young Yu wrote:
> Thank you for the quick reply Keith,
> 
> nr_cpus=24 kernel parameter definitely has limited the present CPU and
> helped spread the queues to the interrupt.
> 
> If you could forgive me asking another question, the admin queue, and 
> half of the I/O queues of all NVMe devices are allocated to cores in a 
> NUMA nodes ( in my case it is NUMA 0 as admin queue wants to stay
> in the CPU0), and the other half of the I/O queues are allocated with 
> the other, even if they are attached to either one of them. This is 
> regardless of whether they are attached to NUMA 0 or 1.
> 
> I’m trying to read from the NVMe devices and send them to the NIC, 
> and they both are attached to the same NUMA node (1). Is it possible 
> to manually bind the first half of nvme8 so they all belongs to the cores 
> in the same NUMA node so I can avoid accessing them using slow QPI 
> between NUMA nodes? (or maybe exclude ones with admin queue 
> because there will be a patch to separate the admin queue and the I/O 
> queue soon) 

If you are getting interrupts on NUMA node 0, that means your request
originated from a thread running on a CPU in NUMA node 0. If you want
interrupts to wake up a CPU in NUMA node 1, you'll need to pin your IO
submission processes to the CPUs in that node.