NVME, isolcpus, and irq affinity

Mon Oct 12 13:13:33 EDT 2020

On Mon, Oct 12, 2020 at 09:49:38AM -0600, Chris Friesen wrote:
> I've got a linux system running the RT kernel with threaded irqs.  On
> startup we affine the various irq threads to the housekeeping CPUs, but I
> recently hit a scenario where after some days of uptime we ended up with a
> number of NVME irq threads affined to application cores instead (not good
> when we're trying to run low-latency applications).
> 
> Looking at the code, it appears that the NVME driver can in some scenarios
> call nvme_setup_io_queues() after the initial setup and thus allocate new
> IRQ threads at runtime.  It appears that this will then call
> pci_alloc_irq_vectors_affinity(), 

Yes, the driver will re-run interrupt setup on a controller reset.

> which seems to determine affinity without any regard for things like
> "isolcpus" or "cset shield".
>
> There seem to be other reports of similar issues:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
> 
> Am I worried about nothing, or is there a risk that those irq threads would
> actually need to do real work (which would cause unacceptable jitter in my
> application)?
> 
> Assuming I'm reading the code correctly, how does it make sense for the NVME
> driver to affine interrupts to CPUs which have explicitly been designated as
> "isolated"?

The driver allocates interrupts, but doesn't affine them. The driver
lets the kernel handle that instead.