IRQ thread timeouts and affinity

Fri Oct 10 06:50:57 PDT 2025

On Thu, Oct 09, 2025 at 07:11:20PM +0100, Marc Zyngier wrote:
> On Thu, 09 Oct 2025 18:04:58 +0100,
> Marc Zyngier <maz at kernel.org> wrote:
> > 
> > On Thu, 09 Oct 2025 17:05:15 +0100,
> > Thierry Reding <thierry.reding at gmail.com> wrote:
> > > 
> > > [1  <text/plain; us-ascii (quoted-printable)>]
> > > On Thu, Oct 09, 2025 at 03:30:56PM +0100, Marc Zyngier wrote:
> > > > Hi Thierry,
> > > > 
> > > > On Thu, 09 Oct 2025 12:38:55 +0100,
> > > > Thierry Reding <thierry.reding at gmail.com> wrote:
> > > > > 
> > > > > Which brings me to the actual question: what is the right way to solve
> > > > > this? I had, maybe naively, assumed that the default CPU affinity, which
> > > > > includes all available CPUs, would be sufficient to have interrupts
> > > > > balanced across all of those CPUs, but that doesn't appear to be the
> > > > > case. At least not with the GIC (v3) driver which selects one CPU (CPU 0
> > > > > in this particular case) from the affinity mask to set the "effective
> > > > > affinity", which then dictates where IRQs are handled and where the
> > > > > corresponding IRQ thread function is run.
> > > > 
> > > > There's a (GIC-specific) answer to that, and that's the "1 of N"
> > > > distribution model. The problem is that it is a massive headache (it
> > > > completely breaks with per-CPU context).
> > > 
> > > Heh, that started out as a very promising first paragraph but turned
> > > ugly very quickly... =)
> > > 
> > > > We could try and hack this in somehow, but defining a reasonable API
> > > > is complicated. The set of CPUs receiving 1:N interrupts is a *global*
> > > > set, which means you cannot have one interrupt targeting CPUs 0-1, and
> > > > another targeting CPUs 2-3. You can only have a single set for all 1:N
> > > > interrupts. How would you define such a set in a platform agnostic
> > > > manner so that a random driver could use this? I definitely don't want
> > > > to have a GIC-specific API.
> > > 
> > > I see. I've been thinking that maybe the only way to solve this is using
> > > some sort of policy. A very simple policy might be: use CPU 0 as the
> > > "default" interrupt (much like it is now) because like you said there
> > > might be assumptions built-in that break when the interrupt is scheduled
> > > elsewhere. But then let individual drivers opt into the 1:N set, which
> > > would perhaps span all available CPUs but the first one. From an API PoV
> > > this would just be a flag that's passed to request_irq() (or one of its
> > > derivatives).
> > 
> > The $10k question is how do you pick the victim CPUs? I can't see how
> > to do it in a reasonable way unless we decide that interrupts that
> > have an affinity matching cpu_possible_mask are 1:N. And then we're
> > left with wondering what to do about CPU hotplug.
> 
> For fun and giggles, here's the result of a 5 minute hack. It enables
> 1:N distribution on SPIs that have an "all cpus" affinity. It works on
> one machine, doesn't on another -- no idea why yet. YMMV.
> 
> This is of course conditioned on your favourite HW supporting the 1:N
> feature, and it is likely that things will catch fire quickly. It will
> probably make your overall interrupt latency *worse*, but maybe less
> variable. Let me know.

You might be onto something here. Mind you, I've only done very limited
testing, but the system does boot and the QSPI related timeouts are gone
completely.

Here's some snippets from the boot log that might be interesting:

[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GIC: enabling workaround for GICv3: NVIDIA erratum T241-FABRIC-4
[    0.000000] GIC: enabling workaround for GICv3: ARM64 erratum 2941627
[    0.000000] GICv3: 960 SPIs implemented
[    0.000000] GICv3: 320 Extended SPIs implemented
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs, 1:N
[    0.000000] GICv3: CPU0: found redistributor 20000 region 0:0x0000000022100000
[...]
[    0.000000] GICv3: using LPI property table @0x0000000101500000
[    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000101540000
[...]

There's a bunch of ITS info that I dropped, as well as the same
redistributor and LPI property table block for each of the 288 CPUs.

/proc/interrupts is much too big to paste here, but it looks like the
QSPI interrupts now end up evenly distributed across the first 72 CPUs
in this system. Not sure why 72, but possibly because this is a 4 NUMA
node system with 72 CPUs each, so the CPU mask might've been restricted
to just the first node.

On the face of it this looks quite promising. Where do we go from here?
Any areas that we need to test more exhaustively to see if this breaks?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20251010/6fc7a905/attachment-0001.sig>