PCI trouble on mvebu (Turris Omnia)
™֟☻̭҇ Ѽ ҉ ®
vtolkm at googlemail.com
Thu Oct 29 15:56:00 EDT 2020
On 29/10/2020 20:30, Bjorn Helgaas wrote:
> On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali at kernel.org> writes:
>>> I have been testing mainline kernel on Turris Omnia with two PCIe
>>> default cards (WLE200 and WLE900) and it worked fine. But I do not know
>>> if I had ASPM enabled or not.
>>>
>>> So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
>>> issue is only when CONFIG_PCIEASPM is enabled?
>> Yup, exactly. And I'm also currently testing with the default WLE200/900
>> cards... I just tried sticking an MT76-based WiFi card into the third
>> PCI slot, and that doesn't come up either when I enable PCIEASPM.
> Huh. So IIUC, the following cases all try to retrain the link and it
> fails to come up again:
>
> - aardvark + WLE900VX (see commit 43fc679ced18)
> - mvebu + WLE200
> - mvebu + WLE900
> - mvebu + MT76
>
> In all these cases, Linux was able to enumerate the NIC, which means
> the link was up when firmware handed it off.
>
> I think Linux decided the Common Clock Configuration was wrong, so it
> tried to fix it and retrain the link, and the link didn't come back
> up.
>
> I don't have "lspci -vv" output from all of them, but in vtolkm's
> case, the firmware handed off with:
>
> 00:02.0 Root Port to [bus 02] SlotClk+ CommClk+
> 02:00.0 QCA986x/988x NIC SlotClk+ CommClk-
>
> Per spec (PCIe r5, sec 7.5.3.7), SlotClk is HwInit and CommClk is RW
> and should power up as 0. If I'm reading the implementation note
> correctly, if SlotClk is set on both ends of the link, software should
> set CommClk, so the config above *does* look wrong, and CommClk+ on
> the Root Port suggests that firmware set it.
>
> I think both the aardvark and mvebu systems probably use U-Boot. I
> don't know U-Boot at all, but I don't see anything in it that touches
> Link Control. I'm curious what happens if you put one of these cards
> in a PC. If anybody tries it, please collect the "sudo lspci -vv" and
> dmesg output.
>
> We could quirk these NICs to avoid the retrain, but since aardvark and
> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
> obvious connection, I doubt there's a simple hardware defect that
> explains all these.
>
> Maybe we're doing something wrong in the retrain, but obviously the
> link came up in the first place. AFAIK the only thing we're changing
> is the CommClk setting, and that looks legitimate per spec.
>
> Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
> and $NIC appropriately, and try the following:
>
> # Set $ROOT and $NIC (update to match your system):
>
> # ROOT=00:02.0
> # NIC=02:00.0
>
> # Dump the Root Port and NIC Link registers:
>
> # setpci -s$ROOT CAP_EXP+0xc.l # Link Capabilities
> # setpci -s$ROOT CAP_EXP+0x10.w # Link Control
> # setpci -s$ROOT CAP_EXP+0x12.w # Link Status
>
> # setpci -s$NIC CAP_EXP+0xc.l # Link Capabilities
> # setpci -s$NIC CAP_EXP+0x10.w # Link Control
> # setpci -s$NIC CAP_EXP+0x12.w # Link Status
>
> # Retrain the link:
>
> # setpci -s$ROOT CAP_EXP+0x10.w=0x0020 # Link Control Retrain Link
> # sleep 1
> # setpci -s$ROOT CAP_EXP+0x12.w # Link Status
> # setpci -s$NIC CAP_EXP+0x12.w # Link Status
>
> # Set CommClk+ and retrain the link:
>
> # setpci -s$NIC CAP_EXP+0x10.w=0x0040 # Link Control Common Clock
> # setpci -s$ROOT CAP_EXP+0x10.w=0x0040 # Link Control Common Clock
> # setpci -s$ROOT CAP_EXP+0x10.w=0x0060 # Link Control RL + CC
> # sleep 1
> # setpci -s$ROOT CAP_EXP+0x12.w # Link Status
> # setpci -s$NIC CAP_EXP+0x12.w # Link Status
ROOT=00:02.0
NIC=02:00.0
setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
setpci -s$ROOT CAP_EXP+0x10.w
0040
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC CAP_EXP+0xc.l
00036c11
setpci -s$NIC CAP_EXP+0x10.w
0000
setpci -s$NIC CAP_EXP+0x12.w
1011
setpci -s$ROOT CAP_EXP+0x10.w=0x0020
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there
are no capabilities with that id.
setpci -s$NIC CAP_EXP+0x10.w=0x0040
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there
are no capabilities with that id.
setpci -s$ROOT CAP_EXP+0x10.w=0x0040
setpci -s$ROOT CAP_EXP+0x10.w=0x0060
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1811
setpci -s$NIC CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there
are no capabilities with that id.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x729CFF47A416598B.asc
Type: application/pgp-keys
Size: 3119 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201029/9f2b78d2/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201029/9f2b78d2/attachment.sig>
More information about the linux-arm-kernel
mailing list