[LSF/MM TOPIC] irq affinity handling for high CPU count machines
Bart Van Assche
bart.vanassche at wdc.com
Mon Jan 29 08:37:31 PST 2018
On 01/29/18 07:41, Elliott, Robert (Persistent Memory) wrote:
>> -----Original Message-----
>> From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf
>> Of Hannes Reinecke
>> Sent: Monday, January 29, 2018 3:09 AM
>> To: lsf-pc at lists.linux-foundation.org
>> Cc: linux-nvme at lists.infradead.org; linux-scsi at vger.kernel.org; Kashyap
>> Desai <kashyap.desai at broadcom.com>
>> Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count machines
>>
>> Hi all,
>>
>> here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
>> mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
>>
>> When doing I/O tests on a machine with more CPUs than MSIx vectors
>> provided by the HBA we can easily setup a scenario where one CPU is
>> submitting I/O and the other one is completing I/O. Which will result in
>> the latter CPU being stuck in the interrupt completion routine for
>> basically ever, resulting in the lockup detector kicking in.
>>
>> How should these situations be handled?
>> Should it be made the responsibility of the drivers, ensuring that the
>> interrupt completion routine is terminated after a certain time?
>> Should it be made the responsibility of the upper layers?
>> Should it be the responsibility of the interrupt mapping code?
>> Can/should interrupt polling be used in these situations?
>
> Back when we introduced scsi-mq with hpsa, the best approach was to
> route interrupts and completion handling so each CPU core handles its
> own submissions; this way, they are self-throttling.
That approach may work for the hpsa adapter but I'm not sure whether it
works for all adapter types. It has already been observed with the SRP
initiator driver running inside a VM that a single core spent all its
time processing IB interrupts.
Additionally, only initiator workloads are self-throttling. Target style
workloads are not self-throttling.
In other words, I think it's worth to discuss this topic further.
Bart.
More information about the Linux-nvme
mailing list