Why GICD_ITARGETSR is not used by Linux
Li Chen
me at linux.beauty
Tue Sep 20 08:37:15 PDT 2022
Hi Russell,
---- On Tue, 20 Sep 2022 12:09:38 +0200 Russell King (Oracle) wrote ---
> On Tue, Sep 20, 2022 at 11:45:10AM +0200, Li Chen wrote:
> > Hi Arnd,
> >
> > ---- On Tue, 20 Sep 2022 09:04:16 +0200 Arnd Bergmann wrote ---
> > > On Tue, Sep 20, 2022, at 3:42 AM, Li Chen wrote:
> > > > Hi Arnd,
> > > >
> > > > I noticed GIC has GICD_ITARGETSR to distribute IRQ to different CPUs,
> > > > but currently, it is not used by Linux.
> > > >
> > > > There was a patchset from MTK people:
> > > > http://archive.lwn.net:8080/linux-kernel/1606486531-25719-1-git-send-email-hanks.chen@mediatek.com/T/#t
> > > > which implements GIC-level IRQ distributor using GICD_ITARGETSR, but it
> > > > is
> > > > not accepted because the maintainer thinks it will break existing codes
> > > > and not provide benefits compared with the existing affinity mechanism.
> > > >
> > > > IIUC, Linux only relies on affinity/irqbalance to distribute IRQ
> > > > instead of architecture-specific solutions like GIC's distributor.
> > > >
> > > > Maybe latency can somewhat get improved, but there is no benchmark yet.
> > > >
> > > > I have two questions here:
> > > > 1. Now that Linux doesn't use GICD_ITARGETSR, where does it set CPU 0
> > > > to be the only IRQ distributor core?
> > > > 2. Do you know any other reasons that GICD_ITARGETSR is not used by
> > > > Linux?
> > >
> > > Hi Li,
> > >
> > > It looks like the original submitter never followed up
> > > with a new version of the patch that addresses the
> > > issues found in review. I would assume they gave up either
> > > because it did not show any real-world advantage, or they
> > > could not address all of the concerns.
> >
> > Thanks for your reply.
> >
> > FYI, here is another thread about this topic: https://lore.kernel.org/linux-arm-kernel/20191120105017.GN25745@shell.armlinux.org.uk/
>
> Oh god, not this again.
>
> The behaviour of the GIC is as follows. If you set two CPUs in
> GICD_ITARGETSRn, then the interrupt will be delivered to _both_ of
> those CPUs. Not just one selected at random or determined by some
> algorithm, but both CPUs.
>
> Both CPUs get woken up if they're in sleep, and both CPUs attempt to
> process the interrupt. One CPU will win the lock, while the other CPU
> spins waiting for the lock to process the interrupt.
>
> The winning CPU will process the interrupt, clear it on the device,
> release the lock and acknowledge it at the GIC CPU interface.
>
> The CPU that lost the previous race can now proceed to process the
> very same interrupt, discovers that it's no longer pending on the
> device, and signals IRQ_NONE as it appears to be a spurious interrupt.
>
> The result is that the losing CPU ends up wasting CPU cycles, and
> if the losing CPU was in a low power idle state, needlessly wakes up
> to process this interrupt.
>
> If you have more CPUs involved, you have more CPUs wasting CPU cycles,
> being woken up wasting power - not just occasionally, but almost every
> single interrupt that is raised from a device in the system.
Thank you very much for your explanation, it sounds like GICD_ITARGETSRn
is a very useless feature, and it is not available in GIC-600 spec anymore.
> On architectures such as x86, the PICs distribute the interrupts in
> hardware amongst the CPUs. So if a single interrupt is set to be sent
> to multiple CPUs, only _one_ of the CPUs is actually interrupted.
> Hence, x86 can have multiple CPUs selected as a destination, and
> the hardware delivers the interrupt across all CPUs.
I had some experiences with alpha-like architecture, which binds MSI to
different cores and I think only one will get interrupted too, but I didn't
know how did they do it.
> On ARM, we don't have that. We have a thundering herd of CPUs if we
> set more than one CPU to process the interrupt, which is grossly
> inefficient.
So, on arm chips with PCIe controller(s), we also rely on irqbalance to
distribute endpoints' legacy irq/MSI/MSIx?
> As I said in the reply you linked to above, I did attempt to implement
> several ideas in software, where the kernel would attempt to distribute
> dynamically the interrupt amongst the CPUs in the affinity mask, but I
> could never get what appeared to be a good behaviour on the platforms
> I was trying and performance wasn't as good. So I abandoned it.
>
> This doesn't preclude someone else having a go at solving that problem,
> but the problem is not solved by setting multiple CPU bits in the
> GICD_ITARGETSRn registers. As I said above, that just gets you a
> thundering herd problem, less performance, and worse power consumption.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
>
More information about the linux-arm-kernel
mailing list