Kernel 5.16.3 and above fails to detect PCIe devices on Turris Omnia (Armada 385 / mvebu)

Pali Rohár pali at kernel.org
Fri Feb 25 05:27:01 PST 2022


On Friday 25 February 2022 14:12:30 Marcel Menzel wrote:
> Am 24.02.2022 um 18:21 schrieb Pali Rohár:
> > On Thursday 24 February 2022 10:25:32 Bjorn Helgaas wrote:
> > > On Thu, Feb 24, 2022 at 05:00:30PM +0100, Marcel Menzel wrote:
> > > > +linux-pci
> > > > 
> > > > Am 24.02.2022 um 14:52 schrieb Marcel Menzel:
> > > > > Am 24.02.2022 um 14:09 schrieb Marcel Menzel:
> > > > > > Hello,
> > > > > > 
> > > > > > When upgrading from kernel 5.16.2 to a newer version (tried 5.16.3
> > > > > > and 5.16.10 with unchanged .config), the Kernel fails to detect both
> > > > > > my installed mPCIe WiFi cards in my Turris Omnia (newer version,
> > > > > > silver case, GPIO pins installed again).
> > > > > > I have two Mediatek MT7915 based cards installed. I also tried with
> > > > > > one Atheros at9k and one ath10k based card, yielding the same
> > > > > > result. On a Kernel version newer than 5.16.2, all cards aren't
> > > > > > getting recognized correctly.
> > > > > > 
> > > > > > Before 5.16.3 I also had to disable PCIe ASPM via boot aragument,
> > > > > > otherwise the WiFi drivers would complain about weird device
> > > > > > behaviors and failing to initialize them, but re-enabling it does
> > > > > > not yield any different results.
> > > Please try this commit, which is headed to mainline today:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus&id=c49ae619905eebd3f54598a84e4cd2bd58ba8fe9
> > > 
> > > This commit should fix the PCI enumeration problem.
> > It should fix that regression. If not, please let me know.
> Can confirm this patch solving the issue. Many thanks!

Perfect!

> > > If you still have
> > > to disable ASPM, that sounds like a separate problem that we should
> > > also try to debug.
> > This is different and known issue and **not** related to ASPM. I spend
> > some time on it, initially I thought it is bug in Atheros cards, but now
> > I'm in impression that this is issue in Marvell PCIe HW that link
> > retraining (required step of ASPM) triggers either Link Down or Hot
> > Reset which triggers another Atheros issue (this one is already
> > documented in kernel pci quirks code).
> > 
> > I will try to implement some workaround for this but requirement is to
> > have all new improvements in pci-mvebu.c + pci-aardvark.c drivers... and
> > review process is slow. So it would not be before all those changes are
> > reviewed and merged.
> Removing "pcie_aspm=off" works for my MT7915E based cards, having had no
> issues so far. So it doesn't seem to be an issue with the Marvell hardware
> itself at least.

That is probably because MT7915E card does not trigger that issue. But
I think issue is really in Marvell hardware.

> Regarding Atheros cards: I disabled it back then for my Atheros AR9582 &
> QCA9880 cards and never re-enabled it when I switched to the MT7915E cards,
> which I forgot to mention in my first mail, sorry!
> I put those two cards back into the device to test it, and the same problem
> occurs why I disabled it back then. The router completely freezes while
> booting with this as the last log lines (gathered via serial):
> 
> [   10.400986] ath9k 0000:02:00.0: can't change power state from D3cold to
> D0 (config space inaccessible)
> [   10.466924] ath10k_pci 0000:03:00.0: can't change power state from D3cold
> to D0 (config space inaccessible)
> [   10.613847] ath10k_pci 0000:03:00.0: failed to wake up device : -110

At this stage there is no link with the card. But kernel does not know
it as there is missing implementation for DLLSC interrupt in pci-mvebu.c
driver. We need DLLSC support for debugging this issue.

For another Marvell driver (pci-aardvark.c) there is already pending
patch for review which adds DLLSC interrupt support:
https://lore.kernel.org/linux-pci/20220220193346.23789-9-kabel@kernel.org/

So on Armada 3720 platforms it is possible to start debugging it.

I have (experimental) DLLSC support prepared also for pci-mvebu.c but it
depends on summary interrupt which is in missing in irq-armada-370-xp.c:
https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=pci-mvebu

So without that summary interrupt in irq-armada-370-xp.c driver it is
not possible to get information about it in pci-mvebu.c driver.

> [   10.622944] usb 1-1: New USB device found, idVendor=0cf3, idProduct=3004,
> bcdDevice= 0.02
> [   10.635092] usb 1-1: New USB device strings: Mfr=0, Product=0,
> SerialNumber=0
> [   10.659930] ath10k_pci: probe of 0000:03:00.0 failed with error -110
> 
> This seems to be another topic however. I'd be glad to test and try to debug
> fixes and / or gather additional information on my hardware regarding this
> problem.



More information about the linux-arm-kernel mailing list