PCI trouble on mvebu (Turris Omnia)
Toke Høiland-Jørgensen
toke at redhat.com
Fri Mar 26 17:51:42 GMT 2021
Pali Rohár <pali at kernel.org> writes:
> On Friday 26 March 2021 17:54:38 Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali at kernel.org> writes:
>> > On Friday 26 March 2021 16:25:27 Toke Høiland-Jørgensen wrote:
>> >> Pali Rohár <pali at kernel.org> writes:
>> >> > Seems that this is really issue in QCA98xx chips. I have send patch
>> >> > which adds quirk for these wifi chips:
>> >> >
>> >> > https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
>> >>
>> >> I tried applying that, and while it does fix the ath10k card, it seems
>> >> to break the ath9k card in the slot next to it.
>> >
>> > Ehm, what?
>>
>> I know, right?! :/
>>
>> > Patch which I have sent today to mailing list calls quirk code only
>> > for PCI device id used by QCA98xx cards. For all other cards it is
>> > noop.
>>
>> So upon further investigation this seems to be unrelated to the patch.
>> Meaning that I can't reliably get the ath9k device to work again by
>> reverting it. And the patch does seem to fix the ath10k device, so I
>> think that's probably good.
>>
>> However, the issue with ath9k does seem to be related to ASPM; if I turn
>> that off in .config, I get the ath9k device back.
>
> Ok, perfect. So this my patch is does not break ath9k.
No, doesn't seem like it!
>> So we have these
>> cases:
>>
>> ASPM disabled: ath9k, ath10k and mt76 cards all work
>> ASPM enabled, no patch: only mt76 card works
>> ASPM enabled + patch: ath10k and mt76 cards work
>>
>> So IDK, maybe the ath9k card needs a quirk as well? Or the mvebu board
>> is just generally flaky?
>
> I'm not sure. Maybe ASPM is somehow buggy on ath9k or needs some special
> handling. But issue is not at PCI config space as ath9k driver start
> initialization of this card. Needs also some debugging in ath9k driver
> if it prints that strange "mac chip rev" error.
Well that's just being output because it gets a revision that it doesn't
recognise - which it seems to be just reading from a register:
https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L255
The value returned is consistent with the value returned just being
0xffffffff. Which from looking at ioread32() is the value being returned
on a failed read. So there's a driver bug there - the check against -EIO
here is obviously nonsensical:
https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/ath/ath9k/hw.c#L290
But the underlying cause appears to be that the read from the register
fails, which I suppose is related to something the PCI bus does?
> I think this issue should be handled separately. Could you report it
> also to ath9k mailing list (and CC me)? Maybe other ath developers would
> know some more details.
I'll send a patch for the nonsensical check above, but other than that I
think we're still in PCI land here, or?
>> > Can you send PCI device id of your ath9k card (lspci -nn)? Because all
>> > my tested ath9k cards have different PCI device id.
>>
>> [root at omnia-arch ~]# lspci -nn
>> 00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 00:03.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:6820] (rev 04)
>> 01:00.0 Network controller [0280]: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e] (rev 01)
>> 02:00.0 Network controller [0280]: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
>
> That is fine. Also all ath9k testing cards have id 0x002e.
>
>> >> When booting with the
>> >> patch applied, I get this in dmesg:
>> >>
>> >> [ 3.556599] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
>> >
>> > Can you send whole dmesg log? So I can see which new err/info lines are
>> > printed.
>>
>> Pasting all three cases below:
> ...
>
> Seem that there is no ASPM related line... But your logs are not
> complete, beginning is missing. So important lines are maybe trimmed.
Ah! Of course - sorry for not noticing that!
Here are the missing bits related to PCIE (pulled off the serial console
- with the patch applied):
[ 1.493064] mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
[ 1.493094] mvebu-pcie soc:pcie: MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
[ 1.493113] mvebu-pcie soc:pcie: MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
[ 1.493129] mvebu-pcie soc:pcie: MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
[ 1.493144] mvebu-pcie soc:pcie: MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
[ 1.493159] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[ 1.493174] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
[ 1.493189] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[ 1.493203] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
[ 1.493217] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[ 1.493231] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
[ 1.493245] mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[ 1.493255] mvebu-pcie soc:pcie: IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
[ 1.493426] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[ 1.493435] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 1.493443] pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
[ 1.493451] pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
[ 1.493458] pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
[ 1.493465] pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
[ 1.493472] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
[ 1.493478] pci_bus 0000:00: root bus resource [io 0x1000-0xeffff]
[ 1.493548] pci 0000:00:01.0: [11ab:6820] type 01 class 0x060400
[ 1.493564] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[ 1.493719] pci 0000:00:02.0: [11ab:6820] type 01 class 0x060400
[ 1.493734] pci 0000:00:02.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[ 1.493868] pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400
[ 1.493882] pci 0000:00:03.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
[ 1.494660] PCI: bus0: Fast back to back transfers disabled
[ 1.494668] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.494677] pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.494685] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.494765] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
[ 1.494788] pci 0000:01:00.0: reg 0x10: [mem 0xe8000000-0xe800ffff 64bit]
[ 1.494901] pci 0000:01:00.0: supports D1
[ 1.494907] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
[ 1.495020] pci 0000:00:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 1.522129] PCI: bus1: Fast back to back transfers enabled
[ 1.522137] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 1.522226] pci 0000:02:00.0: [168c:003c] type 00 class 0x028000
[ 1.522249] pci 0000:02:00.0: reg 0x10: [mem 0xea000000-0xea1fffff 64bit]
[ 1.522283] pci 0000:02:00.0: reg 0x30: [mem 0xea200000-0xea20ffff pref]
[ 1.522362] pci 0000:02:00.0: supports D1 D2
[ 1.522457] pci 0000:00:02.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 1.522466] pcie_change_tls_to_getn1() called for device 6820:0:0
[ 1.522472] pci 0000:00:02.0: ASPM: Bridge does not support changing Link Speed to 2.5 GT/s
[ 1.522477] pci 0000:00:02.0: ASPM: Retrain Link at higher speed is disallowed by quirk
[ 1.522482] pci 0000:00:02.0: ASPM: Could not configure common clock
[ 1.523241] PCI: bus2: Fast back to back transfers disabled
[ 1.523247] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[ 1.523332] pci 0000:03:00.0: [14c3:7612] type 00 class 0x028000
[ 1.523357] pci 0000:03:00.0: reg 0x10: [mem 0xec000000-0xec0fffff 64bit]
[ 1.523393] pci 0000:03:00.0: reg 0x30: [mem 0xec100000-0xec10ffff pref]
[ 1.523481] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
[ 1.523601] pci 0000:00:03.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 1.552139] PCI: bus3: Fast back to back transfers disabled
[ 1.552147] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[ 1.552183] pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
[ 1.552193] pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
[ 1.552202] pci 0000:00:03.0: BAR 8: assigned [mem 0xe0600000-0xe07fffff]
[ 1.552211] pci 0000:00:01.0: BAR 6: assigned [mem 0xe0100000-0xe01007ff pref]
[ 1.552221] pci 0000:00:02.0: BAR 6: assigned [mem 0xe0500000-0xe05007ff pref]
[ 1.552229] pci 0000:00:03.0: BAR 6: assigned [mem 0xe0800000-0xe08007ff pref]
[ 1.552238] pci 0000:01:00.0: BAR 0: assigned [mem 0xe0000000-0xe000ffff 64bit]
[ 1.552247] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
[ 1.552254] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
[ 1.552261] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 1.552269] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xe00fffff]
[ 1.552279] pci 0000:02:00.0: BAR 0: assigned [mem 0xe0200000-0xe03fffff 64bit]
[ 1.552293] pci 0000:02:00.0: BAR 6: assigned [mem 0xe0400000-0xe040ffff pref]
[ 1.552300] pci 0000:00:02.0: PCI bridge to [bus 02]
[ 1.552306] pci 0000:00:02.0: bridge window [mem 0xe0200000-0xe04fffff]
[ 1.552315] pci 0000:03:00.0: BAR 0: assigned [mem 0xe0600000-0xe06fffff 64bit]
[ 1.552329] pci 0000:03:00.0: BAR 6: assigned [mem 0xe0700000-0xe070ffff pref]
[ 1.552335] pci 0000:00:03.0: PCI bridge to [bus 03]
[ 1.552342] pci 0000:00:03.0: bridge window [mem 0xe0600000-0xe07fffff]
>> >> Could there be some kind of data corruption in play here making the
>> >> driver think the chip revision is wrong, or something like that? If I
>> >> boot the same kernel without the patch applied, the ath9k initialisation
>> >> works fine, but obviously the ath10k is then still broken...
>> >
>> > There is something really strange.
>> >
>> > Can you add debug log into pcie_change_tls_to_gen1() function to check
>> > for which card is this function called?
>>
>> Erm, it looks like it's never called? I added this:
>
> Ehm? With patch it must be called otherwise ath10k card would not be
> detected on PCIe bus. And you tested that patch fixes it...
Yeah, that was due to the missing log lines; it's in the output above.
-Toke
More information about the linux-arm-kernel
mailing list