PCI trouble on mvebu (Turris Omnia)

™֟☻̭҇ Ѽ ҉ ® vtolkm at googlemail.com
Thu Oct 29 15:56:00 EDT 2020


On 29/10/2020 20:30, Bjorn Helgaas wrote:
> On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
>> Pali Rohár <pali at kernel.org> writes:
>>> I have been testing mainline kernel on Turris Omnia with two PCIe
>>> default cards (WLE200 and WLE900) and it worked fine. But I do not know
>>> if I had ASPM enabled or not.
>>>
>>> So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
>>> issue is only when CONFIG_PCIEASPM is enabled?
>> Yup, exactly. And I'm also currently testing with the default WLE200/900
>> cards... I just tried sticking an MT76-based WiFi card into the third
>> PCI slot, and that doesn't come up either when I enable PCIEASPM.
> Huh.  So IIUC, the following cases all try to retrain the link and it
> fails to come up again:
>
>    - aardvark + WLE900VX (see commit 43fc679ced18)
>    - mvebu + WLE200
>    - mvebu + WLE900
>    - mvebu + MT76
>
> In all these cases, Linux was able to enumerate the NIC, which means
> the link was up when firmware handed it off.
>
> I think Linux decided the Common Clock Configuration was wrong, so it
> tried to fix it and retrain the link, and the link didn't come back
> up.
>
> I don't have "lspci -vv" output from all of them, but in vtolkm's
> case, the firmware handed off with:
>
>    00:02.0 Root Port to [bus 02]  SlotClk+ CommClk+
>    02:00.0 QCA986x/988x NIC       SlotClk+ CommClk-
>
> Per spec (PCIe r5, sec 7.5.3.7), SlotClk is HwInit and CommClk is RW
> and should power up as 0.  If I'm reading the implementation note
> correctly, if SlotClk is set on both ends of the link, software should
> set CommClk, so the config above *does* look wrong, and CommClk+ on
> the Root Port suggests that firmware set it.
>
> I think both the aardvark and mvebu systems probably use U-Boot.  I
> don't know U-Boot at all, but I don't see anything in it that touches
> Link Control.  I'm curious what happens if you put one of these cards
> in a PC.  If anybody tries it, please collect the "sudo lspci -vv" and
> dmesg output.
>
> We could quirk these NICs to avoid the retrain, but since aardvark and
> mvebu have no obvious connection and WLE200/WLE900 and MT76 have no
> obvious connection, I doubt there's a simple hardware defect that
> explains all these.
>
> Maybe we're doing something wrong in the retrain, but obviously the
> link came up in the first place.  AFAIK the only thing we're changing
> is the CommClk setting, and that looks legitimate per spec.
>
> Another experiment: build kernel without CONFIG_PCIEASPM, set $ROOT
> and $NIC appropriately, and try the following:
>
>    # Set $ROOT and $NIC (update to match your system):
>
>      # ROOT=00:02.0
>      # NIC=02:00.0
>
>    # Dump the Root Port and NIC Link registers:
>
>      # setpci -s$ROOT CAP_EXP+0xc.l              # Link Capabilities
>      # setpci -s$ROOT CAP_EXP+0x10.w             # Link Control
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>
>      # setpci -s$NIC  CAP_EXP+0xc.l              # Link Capabilities
>      # setpci -s$NIC  CAP_EXP+0x10.w             # Link Control
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status
>
>    # Retrain the link:
>
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0020      # Link Control Retrain Link
>      # sleep 1
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status
>
>    # Set CommClk+ and retrain the link:
>
>      # setpci -s$NIC  CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0040      # Link Control Common Clock
>      # setpci -s$ROOT CAP_EXP+0x10.w=0x0060      # Link Control RL + CC
>      # sleep 1
>      # setpci -s$ROOT CAP_EXP+0x12.w             # Link Status
>      # setpci -s$NIC  CAP_EXP+0x12.w             # Link Status

ROOT=00:02.0
NIC=02:00.0
setpci -s$ROOT CAP_EXP+0xc.l
0003ac12
setpci -s$ROOT CAP_EXP+0x10.w
0040
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0xc.l

00036c11
setpci -s$NIC  CAP_EXP+0x10.w
0000
setpci -s$NIC  CAP_EXP+0x12.w
1011
setpci -s$ROOT CAP_EXP+0x10.w=0x0020
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1011
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.
setpci -s$NIC  CAP_EXP+0x10.w=0x0040
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.
setpci -s$ROOT CAP_EXP+0x10.w=0x0040
setpci -s$ROOT CAP_EXP+0x10.w=0x0060
sleep 1
setpci -s$ROOT CAP_EXP+0x12.w
1811
setpci -s$NIC  CAP_EXP+0x12.w
setpci: 0000:02:00.0: Instance #0 of Capability 0010 not found - there 
are no capabilities with that id.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x729CFF47A416598B.asc
Type: application/pgp-keys
Size: 3119 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201029/9f2b78d2/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201029/9f2b78d2/attachment.sig>


More information about the linux-arm-kernel mailing list