NVME, isolcpus, and irq affinity

Tue Oct 13 02:24:03 EDT 2020

On 10/12/2020 6:51 PM, Ming Lei wrote:
> On Mon, Oct 12, 2020 at 11:52 PM Chris Friesen
> <chris.friesen at windriver.com> wrote:
>>
>> Hi,
>>
>> I'm not subscribed to the list so please CC me on replies.
>>
>> I've got a linux system running the RT kernel with threaded irqs.  On
>> startup we affine the various irq threads to the housekeeping CPUs, but
>> I recently hit a scenario where after some days of uptime we ended up
>> with a number of NVME irq threads affined to application cores instead
>> (not good when we're trying to run low-latency applications).
>>
>> Looking at the code, it appears that the NVME driver can in some
>> scenarios call nvme_setup_io_queues() after the initial setup and thus
>> allocate new IRQ threads at runtime.  It appears that this will then
>> call pci_alloc_irq_vectors_affinity(), which seems to determine affinity
>> without any regard for things like "isolcpus" or "cset shield".
>>
>> There seem to be other reports of similar issues:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1831566
>>
>> Am I worried about nothing, or is there a risk that those irq threads
>> would actually need to do real work (which would cause unacceptable
>> jitter in my application)?
>>
>> Assuming I'm reading the code correctly, how does it make sense for the
>> NVME driver to affine interrupts to CPUs which have explicitly been
>> designated as "isolated"?
> 
> You may pass 'isolcpus=managed_irq,...' for this kind of isolation, see details
> in 'isolcpus=' part of Documentation/admin-guide/kernel-parameters.txt.
> 
> And this feature is added since v5.6.

I suspect that might work, unfortunately it's not available in our 
kernel and jumping to a brand new kernel will mean a lot of additional 
validation work so it's not something we can do on a whim.

I'm definitely looking forward to moving to something newer though.

Chris