PCI trouble on mvebu (Turris Omnia)

Toke Høiland-Jørgensen toke at redhat.com
Fri Mar 26 17:51:42 GMT 2021


Pali Rohár <pali at kernel.org> writes:

> On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali at kernel.org> writes:
>> > On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
>> >> Pali Rohár <pali at kernel.org> writes:
>> >> > Seems that this is really issue in QCA98xx chips. I have send patch
>> >> > which adds quirk for these wifi chips:
>> >> >
>> >> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
>> >> 
>> >> I tried applying that, and while it does fix the ath10k card, it seems
>> >> to break the ath9k card in the slot next to it.
>> >
>> > Ehm, what?
>> 
>> I know, right?! :/
>> 
>> > Patch which I have sent today to mailing list calls quirk code only
>> > for PCI device id used by QCA98xx cards. For all other cards it is
>> > noop.
>> 
>> So upon further investigation this seems to be unrelated to the patch.
>> Meaning that I can't reliably get the ath9k device to work again by
>> reverting it. And the patch does seem to fix the ath10k device, so I
>> think that's probably good.
>> 
>> However, the issue with ath9k does seem to be related to ASPM; if I turn
>> that off in .config, I get the ath9k device back.
>
> Ok, perfect. So this my patch is does not break ath9k.

No, doesn't seem like it!

>> So we have these
>> cases:
>> 
>> ASPM disabled:          ath9k, ath10k and mt76 cards all work
>> ASPM enabled, no patch: only mt76 card works
>> ASPM enabled + patch:   ath10k and mt76 cards work
>> 
>> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
>> is just generally flaky?
>
> I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
> handling. But issue is not at PCI config space as ath9k driver start
> initialization of this card. Needs also some debugging in ath9k driver
> if it prints that strange "mac chip rev" error.

Well that's just being output because it gets a revision that it doesn't
recognise - which it seems to be just reading from a register:

https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255

The value returned is consistent with the value returned just being
0xffffffff. Which from looking at ioread32() is the value being returned
on a failed read. So there's a driver bug there - the check against -EIO
here is obviously nonsensical:

https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290

But the underlying cause appears to be that the read from the register
fails, which I suppose is related to something the PCI bus does?

> I think this issue should be handled separately. Could you report it
> also to ath9k mailing list (and CC me)? Maybe other ath developers would
> know some more details.

I'll send a patch for the nonsensical check above, but other than that I
think we're still in PCI land here, or?

>> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
>> > my tested ath9k cards have different PCI device id.
>> 
>> [root at omnia-arch ~]# lspci -nn
>> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>
> That is fine. Also all ath9k testing cards have id 0x002e.
>
>> >> When booting with the
>> >> patch applied, I get this in dmesg:
>> >> 
>> >> [    3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> >
>> > Can you send whole dmesg log? So I can see which new err/info lines are
>> > printed.
>> 
>> Pasting all three cases below:
> ...
>
> Seem that there is no ASPM related line... But your logs are not
> complete, beginning is missing. So important lines are maybe trimmed.

Ah! Of course - sorry for not noticing that!

Here are the missing bits related to PCIE (pulled off the serial console
- with the patch applied):

[    1.493064] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[    1.493094] mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[    1.493113] mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[    1.493129] mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[    1.493144] mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[    1.493159] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.493174] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[    1.493189] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.493203] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[    1.493217] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.493231] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[    1.493245] mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.493255] mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[    1.493426] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    1.493435] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.493443] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[    1.493451] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[    1.493458] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[    1.493465] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[    1.493472] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[    1.493478] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
[    1.493548] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[    1.493564] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.493719] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[    1.493734] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.493868] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[    1.493882] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[    1.494660] PCI: bus0: Fast back to back transfers disabled
[    1.494668] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494677] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494685] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.494765] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[    1.494788] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[    1.494901] pci 0000:01:00.0: supports D1
[    1.494907] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[    1.495020] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.522129] PCI: bus1: Fast back to back transfers enabled
[    1.522137] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.522226] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[    1.522249] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
[    1.522283] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
[    1.522362] pci 0000:02:00.0: supports D1 D2
[    1.522457] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.522466] pcie_change_tls_to_getn1() called for device 6820:0:0
[    1.522472] pci 0000:00:02.0: ASPM: Bridge does not support changing Link Speed to 2.5 GT/s
[    1.522477] pci 0000:00:02.0: ASPM: Retrain Link at higher speed is disallowed by quirk
[    1.522482] pci 0000:00:02.0: ASPM: Could not configure common clock
[    1.523241] PCI: bus2: Fast back to back transfers disabled
[    1.523247] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[    1.523332] pci 0000:03:00.0: [14c3:7612] type 00 class 0x028000
[    1.523357] pci 0000:03:00.0: reg 0x10: [mem 0xec000000-0xec0fffff 64bit]
[    1.523393] pci 0000:03:00.0: reg 0x30: [mem 0xec100000-0xec10ffff pref]
[    1.523481] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
[    1.523601] pci 0000:00:03.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[    1.552139] PCI: bus3: Fast back to back transfers disabled
[    1.552147] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.552183] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[    1.552193] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[    1.552202] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0600000-0xe07fffff]
[    1.552211] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[    1.552221] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[    1.552229] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0800000-0xe08007ff pref]
[    1.552238] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[    1.552247] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[    1.552254] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[    1.552261] pci 0000:00:01.0: PCI bridge to [bus 01]
[    1.552269] pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
[    1.552279] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[    1.552293] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[    1.552300] pci 0000:00:02.0: PCI bridge to [bus 02]
[    1.552306] pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
[    1.552315] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0600000-0xe06fffff 64bit]
[    1.552329] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0700000-0xe070ffff pref]
[    1.552335] pci 0000:00:03.0: PCI bridge to [bus 03]
[    1.552342] pci 0000:00:03.0:   bridge window [mem 0xe0600000-0xe07fffff]


>> >> Could there be some kind of data corruption in play here making the
>> >> driver think the chip revision is wrong, or something like that? If I
>> >> boot the same kernel without the patch applied, the ath9k initialisation
>> >> works fine, but obviously the ath10k is then still broken...
>> >
>> > There is something really strange.
>> >
>> > Can you add debug log into pcie_change_tls_to_gen1() function to check
>> > for which card is this function called?
>> 
>> Erm, it looks like it's never called? I added this:
>
> Ehm? With patch it must be called otherwise ath10k card would not be
> detected on PCIe bus. And you tested that patch fixes it...

Yeah, that was due to the missing log lines; it's in the output above.

-Toke




More information about the linux-arm-kernel mailing list