IRQ thread timeouts and affinity
Marc Zyngier
maz at kernel.org
Fri Oct 10 07:18:13 PDT 2025
On Fri, 10 Oct 2025 14:50:57 +0100,
Thierry Reding <thierry.reding at gmail.com> wrote:
>
> On Thu, Oct 09, 2025 at 07:11:20PM +0100, Marc Zyngier wrote:
> > On Thu, 09 Oct 2025 18:04:58 +0100,
> > Marc Zyngier <maz at kernel.org> wrote:
> > >
> > > On Thu, 09 Oct 2025 17:05:15 +0100,
> > > Thierry Reding <thierry.reding at gmail.com> wrote:
> > > >
> > > > [1 <text/plain; us-ascii (quoted-printable)>]
> > > > On Thu, Oct 09, 2025 at 03:30:56PM +0100, Marc Zyngier wrote:
> > > > > Hi Thierry,
> > > > >
> > > > > On Thu, 09 Oct 2025 12:38:55 +0100,
> > > > > Thierry Reding <thierry.reding at gmail.com> wrote:
> > > > > >
> > > > > > Which brings me to the actual question: what is the right way to solve
> > > > > > this? I had, maybe naively, assumed that the default CPU affinity, which
> > > > > > includes all available CPUs, would be sufficient to have interrupts
> > > > > > balanced across all of those CPUs, but that doesn't appear to be the
> > > > > > case. At least not with the GIC (v3) driver which selects one CPU (CPU 0
> > > > > > in this particular case) from the affinity mask to set the "effective
> > > > > > affinity", which then dictates where IRQs are handled and where the
> > > > > > corresponding IRQ thread function is run.
> > > > >
> > > > > There's a (GIC-specific) answer to that, and that's the "1 of N"
> > > > > distribution model. The problem is that it is a massive headache (it
> > > > > completely breaks with per-CPU context).
> > > >
> > > > Heh, that started out as a very promising first paragraph but turned
> > > > ugly very quickly... =)
> > > >
> > > > > We could try and hack this in somehow, but defining a reasonable API
> > > > > is complicated. The set of CPUs receiving 1:N interrupts is a *global*
> > > > > set, which means you cannot have one interrupt targeting CPUs 0-1, and
> > > > > another targeting CPUs 2-3. You can only have a single set for all 1:N
> > > > > interrupts. How would you define such a set in a platform agnostic
> > > > > manner so that a random driver could use this? I definitely don't want
> > > > > to have a GIC-specific API.
> > > >
> > > > I see. I've been thinking that maybe the only way to solve this is using
> > > > some sort of policy. A very simple policy might be: use CPU 0 as the
> > > > "default" interrupt (much like it is now) because like you said there
> > > > might be assumptions built-in that break when the interrupt is scheduled
> > > > elsewhere. But then let individual drivers opt into the 1:N set, which
> > > > would perhaps span all available CPUs but the first one. From an API PoV
> > > > this would just be a flag that's passed to request_irq() (or one of its
> > > > derivatives).
> > >
> > > The $10k question is how do you pick the victim CPUs? I can't see how
> > > to do it in a reasonable way unless we decide that interrupts that
> > > have an affinity matching cpu_possible_mask are 1:N. And then we're
> > > left with wondering what to do about CPU hotplug.
> >
> > For fun and giggles, here's the result of a 5 minute hack. It enables
> > 1:N distribution on SPIs that have an "all cpus" affinity. It works on
> > one machine, doesn't on another -- no idea why yet. YMMV.
> >
> > This is of course conditioned on your favourite HW supporting the 1:N
> > feature, and it is likely that things will catch fire quickly. It will
> > probably make your overall interrupt latency *worse*, but maybe less
> > variable. Let me know.
>
> You might be onto something here. Mind you, I've only done very limited
> testing, but the system does boot and the QSPI related timeouts are gone
> completely.
Hey, progress.
> Here's some snippets from the boot log that might be interesting:
>
> [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode
> [ 0.000000] GIC: enabling workaround for GICv3: NVIDIA erratum T241-FABRIC-4
> [ 0.000000] GIC: enabling workaround for GICv3: ARM64 erratum 2941627
> [ 0.000000] GICv3: 960 SPIs implemented
> [ 0.000000] GICv3: 320 Extended SPIs implemented
> [ 0.000000] Root IRQ handler: gic_handle_irq
> [ 0.000000] GICv3: GICv3 features: 16 PPIs, 1:N
> [ 0.000000] GICv3: CPU0: found redistributor 20000 region 0:0x0000000022100000
> [...]
> [ 0.000000] GICv3: using LPI property table @0x0000000101500000
> [ 0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000101540000
> [...]
>
> There's a bunch of ITS info that I dropped, as well as the same
> redistributor and LPI property table block for each of the 288 CPUs.
>
> /proc/interrupts is much too big to paste here, but it looks like the
> QSPI interrupts now end up evenly distributed across the first 72 CPUs
> in this system. Not sure why 72, but possibly because this is a 4 NUMA
> node system with 72 CPUs each, so the CPU mask might've been restricted
> to just the first node.
It could well be that your firmware sets GICR_CTLR.DPG1NS on the 3
other nodes, and the patch I gave you doesn't try to change that.
Check with [1], which does the right thing on that front (it fixed a
similar problem on my slightly more modest 12 CPU machine).
> On the face of it this looks quite promising. Where do we go from here?
For a start, you really should consider sending me one of these
machines. I have plans for it ;-)
> Any areas that we need to test more exhaustively to see if this breaks?
CPU hotplug is the main area of concern, and I'm pretty sure it breaks
this distribution mechanism (or the other way around). Another thing
is that if firmware isn't aware that 1:N interrupts can (or should)
wake-up a CPU from sleep, bad things will happen. Given that nobody
uses 1:N, you can bet that any bit of privileged SW (TF-A,
hypervisors) is likely to be buggy (I've already spotted bugs in KVM
around this).
The other concern is the shape of the API we would expose to drivers,
because I'm not sure we want this sort of "scatter-gun" approach for
all SPIs, and I don't know how that translates to other architectures.
Thomas should probably weight in here.
Thanks,
M.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/gicv3-1ofN&id=5856e2eb479fc41ea60e76440f768079a1a21a36
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list