Kernel 5.16.3 and above fails to detect PCIe devices on Turris Omnia (Armada 385 / mvebu)

Marcel Menzel mail at mcl.gg
Fri Feb 25 05:12:30 PST 2022



Am 24.02.2022 um 18:21 schrieb Pali Rohár:
> On Thursday 24 February 2022 10:25:32 Bjorn Helgaas wrote:
>> On Thu, Feb 24, 2022 at 05:00:30PM +0100, Marcel Menzel wrote:
>>> +linux-pci
>>>
>>> Am 24.02.2022 um 14:52 schrieb Marcel Menzel:
>>>> Am 24.02.2022 um 14:09 schrieb Marcel Menzel:
>>>>> Hello,
>>>>>
>>>>> When upgrading from kernel 5.16.2 to a newer version (tried 5.16.3
>>>>> and 5.16.10 with unchanged .config), the Kernel fails to detect both
>>>>> my installed mPCIe WiFi cards in my Turris Omnia (newer version,
>>>>> silver case, GPIO pins installed again).
>>>>> I have two Mediatek MT7915 based cards installed. I also tried with
>>>>> one Atheros at9k and one ath10k based card, yielding the same
>>>>> result. On a Kernel version newer than 5.16.2, all cards aren't
>>>>> getting recognized correctly.
>>>>>
>>>>> Before 5.16.3 I also had to disable PCIe ASPM via boot aragument,
>>>>> otherwise the WiFi drivers would complain about weird device
>>>>> behaviors and failing to initialize them, but re-enabling it does
>>>>> not yield any different results.
>> Please try this commit, which is headed to mainline today:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus&id=c49ae619905eebd3f54598a84e4cd2bd58ba8fe9
>>
>> This commit should fix the PCI enumeration problem.
> It should fix that regression. If not, please let me know.
Can confirm this patch solving the issue. Many thanks!

>> If you still have
>> to disable ASPM, that sounds like a separate problem that we should
>> also try to debug.
> This is different and known issue and **not** related to ASPM. I spend
> some time on it, initially I thought it is bug in Atheros cards, but now
> I'm in impression that this is issue in Marvell PCIe HW that link
> retraining (required step of ASPM) triggers either Link Down or Hot
> Reset which triggers another Atheros issue (this one is already
> documented in kernel pci quirks code).
>
> I will try to implement some workaround for this but requirement is to
> have all new improvements in pci-mvebu.c + pci-aardvark.c drivers... and
> review process is slow. So it would not be before all those changes are
> reviewed and merged.
Removing "pcie_aspm=off" works for my MT7915E based cards, having had no 
issues so far. So it doesn't seem to be an issue with the Marvell 
hardware itself at least.

Regarding Atheros cards: I disabled it back then for my Atheros AR9582 & 
QCA9880 cards and never re-enabled it when I switched to the MT7915E 
cards, which I forgot to mention in my first mail, sorry!
I put those two cards back into the device to test it, and the same 
problem occurs why I disabled it back then. The router completely 
freezes while booting with this as the last log lines (gathered via serial):

[   10.400986] ath9k 0000:02:00.0: can't change power state from D3cold 
to D0 (config space inaccessible)
[   10.466924] ath10k_pci 0000:03:00.0: can't change power state from 
D3cold to D0 (config space inaccessible)
[   10.613847] ath10k_pci 0000:03:00.0: failed to wake up device : -110
[   10.622944] usb 1-1: New USB device found, idVendor=0cf3, 
idProduct=3004, bcdDevice= 0.02
[   10.635092] usb 1-1: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[   10.659930] ath10k_pci: probe of 0000:03:00.0 failed with error -110

This seems to be another topic however. I'd be glad to test and try to 
debug fixes and / or gather additional information on my hardware 
regarding this problem.



More information about the linux-arm-kernel mailing list