[patch V4 00/21] genirq, irqchip: Convert ARM MSI handling to per device MSI domains
Marc Zyngier
maz at kernel.org
Tue Jul 16 11:21:39 PDT 2024
[Dropping shivamurthy.shastri at linutronix.de who is now bouncing...]
On Tue, 16 Jul 2024 15:53:28 +0100,
Johan Hovold <johan at kernel.org> wrote:
>
> On Tue, Jul 16, 2024 at 11:30:05AM +0100, Marc Zyngier wrote:
> > On Mon, 15 Jul 2024 15:10:01 +0100,
> > Johan Hovold <johan at kernel.org> wrote:
> > > On Mon, Jul 15, 2024 at 01:58:13PM +0100, Marc Zyngier wrote:
> > > > On Mon, 15 Jul 2024 12:18:47 +0100,
> > > > Johan Hovold <johan at kernel.org> wrote:
> > > > > On Sun, Jun 23, 2024 at 05:18:31PM +0200, Thomas Gleixner wrote:
> > > > > > This is version 4 of the series to convert ARM MSI handling over to
> > > > > > per device MSI domains.
> > >
> > > > > This series only showed up in linux-next last Friday and broke interrupt
> > > > > handling on Qualcomm platforms like sc8280xp (e.g. Lenovo ThinkPad X13s)
> > > > > and x1e80100 that use the GIC ITS for PCIe MSIs.
> > > > >
> > > > > I've applied the series (21 commits from linux-next) on top of 6.10 and
> > > > > can confirm that the breakage is caused by commits:
> > > > >
> > > > > 3d1c927c08fc ("irqchip/gic-v3-its: Switch platform MSI to MSI parent")
> > > > > 233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]")
> > > > >
> > > > > Applying the series up until the change before 3d1c927c08fc unbreaks the
> > > > > wifi on one machine:
> > > > >
> > > > > ath11k_pci 0006:01:00.0: failed to enable msi: -22
> > > > > ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -22
>
> Correction, this doesn't fix the wifi, but I'm not seeing these errors
> with the commit before cc23d1dfc959 as the ath11k driver doesn't get
> this far (or doesn't probe at all).
I think we need to track one thing at a time. The wifi and nvme
problems seem subtly different... Which is the exact commit that
breaks nvme on your machine?
[...]
> > So is this issue actually tied to the async probing? Does it always
> > work if you disable it?
>
> There seem to multiple issues here.
>
> With the full series applied and normal async (i.e. parallel) probing of
> the PCIe controllers I sometimes see allocation failing with -ENOSPC
> (e.g. the above ath11k errors). This seems to indicate broken locking
> somewhere.
Your log doesn't support this theory. At least not from an ITS
perspective, as it keeps dishing out INTIDs (and it is very hard to
run out of IRQs with the ITS).
>
> With synchronous probing, allocation always seems to succeed but the
> ath11k (and modem) drivers time out as no interrupts are received.
>
> The NVMe driver sometimes falls back to INTx signalling and can access
> the drive, but often end up with an MSIX (?!) allocation and then fails
> to probe:
>
> [ 132.084740] nvme nvme0: I/O tag 17 (1011) QID 0 timeout, completion polled
So one of my test boxes (ThunderX) fails this exact way, while another
(Synquacer) is pretty happy. Still trying to understand the difference
in behaviour.
How do you enforce synchronous probing?
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list