[LSF/MM TOPIC] irq affinity handling for high CPU count machines

Thu Feb 1 07:05:39 PST 2018

Hello Hannes,

On Mon, Jan 29, 2018 at 10:08:43AM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
> 
> When doing I/O tests on a machine with more CPUs than MSIx vectors
> provided by the HBA we can easily setup a scenario where one CPU is
> submitting I/O and the other one is completing I/O. Which will result in
> the latter CPU being stuck in the interrupt completion routine for
> basically ever, resulting in the lockup detector kicking in.

Today I am looking at one megaraid_sas related issue, and found
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) is used in the driver, so looks
each reply queue has been handled by more than one CPU if there are more
CPUs than MSIx vectors in the system, which is done by generic irq affinity
code, please see kernel/irq/affinity.c.

Also IMO each reply queue may be treated as blk-mq's hw queue, then
megaraid may benefit from blk-mq's MQ framework, but one annoying thing is
that both legacy and blk-mq path need to be handled inside driver.

> 
> How should these situations be handled?
> Should it be made the responsibility of the drivers, ensuring that the
> interrupt completion routine is terminated after a certain time?
> Should it be made the resposibility of the upper layers?
> Should it be the responsibility of the interrupt mapping code?
> Can/should interrupt polling be used in these situations?

Yeah, I guess interrupt polling may improve these situations, especially
KPTI introduces some extra cost in interrupt handling.

Thanks,
Ming