PCI trouble on mvebu (Turris Omnia)

™֟☻̭҇ Ѽ ҉ ® vtolkm at googlemail.com
Mon Nov 2 11:18:00 EST 2020


On 02/11/2020 16:54, Toke Høiland-Jørgensen wrote:
> Pali Rohár <pali at kernel.org> writes:
>
>> On Saturday 31 October 2020 13:49:49 Toke Høiland-Jørgensen wrote:
>>> "™֟☻̭҇ Ѽ ҉ ®" <vtolkm at googlemail.com> writes:
>>>
>>>> On 30/10/2020 15:23, Pali Rohár wrote:
>>>>> On Friday 30 October 2020 14:02:22 Toke Høiland-Jørgensen wrote:
>>>>>> Pali Rohár <pali at kernel.org> writes:
>>>>>>> My experience with that WLE900VX card, aardvark driver and aspm code:
>>>>>>>
>>>>>>> Link training in GEN2 mode for this card succeed only once after reset.
>>>>>>> Repeated link retraining fails and it fails even when aardvark is
>>>>>>> reconfigured to GEN1 mode. Reset via PERST# signal is required to have
>>>>>>> working link training.
>>>>>>>
>>>>>>> What I did in aardvark driver: Set mode to GEN2, do link training. If
>>>>>>> success read "negotiated link speed" from "Link Control Status Register"
>>>>>>> (for WLE900VX it is 0x1 - GEN1) and set it into aardvark. And then
>>>>>>> retrain link again (for WLE900VX now it would be at GEN1). After that
>>>>>>> card is stable and all future retraining (e.g. from aspm.c) also passes.
>>>>>>>
>>>>>>> If I do not change aardvark mode from GEN2 to GEN1 the second link
>>>>>>> training fails. And if I change mode to GEN1 after this failed link
>>>>>>> training then nothing happen, link training do not success.
>>>>>>>
>>>>>>> So just speculation now... In current setup initialization of card does
>>>>>>> one link training at GEN2. Then aspm.c is called which is doing second
>>>>>>> link retraining at GEN2. And if it fails then below patch issue third
>>>>>>> link retraining at GEN1. If A38x/pci-mvebu has same problem as aardvark
>>>>>>> then second link retraining must be at GEN1 (not GEN2) to workaround
>>>>>>> this issue.
>>>>>>>
>>>>>>> Bjorn, Toke: what about trying to hack aspm.c code to never do link
>>>>>>> retraining at GEN2 speed? And always force GEN1 speed prior link
>>>>>>> training?
>>>>>> Sounds like a plan. I poked around in aspm.c and must confess to being a
>>>>>> bit lost in the soup of registers ;)
>>>>>>
>>>>>> So if one of you can cook up a patch, that would be most helpful!
>>>>> I modified Bjorn's patch, explicitly set tls to 1 and added debug info
>>>>> about cls (current link speed, that what is used by aardvark). It is
>>>>> untested, I just tried to compile it.
>>>>>
>>>>> Can try it?
>>>>>
>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>> index 253c30cc1967..f934c0b52f41 100644
>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>> @@ -206,6 +206,27 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>    	unsigned long end_jiffies;
>>>>>    	u16 reg16;
>>>>>    
>>>>> +	u32 lnkcap2;
>>>>> +	u16 lnksta, lnkctl2, cls, tls;
>>>>> +
>>>>> +	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &lnkcap2);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &lnksta);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>> +	cls = lnksta & PCI_EXP_LNKSTA_CLS;
>>>>> +	tls = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>>>>> +
>>>>> +	pci_info(parent, "lnkcap2 %#010x sls %#04x lnksta %#06x cls %#03x lnkctl2 %#06x tls %#03x\n",
>>>>> +		lnkcap2, (lnkcap2 & 0x3F) >> 1,
>>>>> +		lnksta, cls,
>>>>> +		lnkctl2, tls);
>>>>> +
>>>>> +	tls = 1;
>>>>> +	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
>>>>> +					PCI_EXP_LNKCTL2_TLS, tls);
>>>>> +	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
>>>>> +	pci_info(parent, "lnkctl2 %#010x new tls %#03x\n",
>>>>> +		lnkctl2, tls);
>>>>> +
>>>>>    	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
>>>>>    	reg16 |= PCI_EXP_LNKCTL_RL;
>>>>>    	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
>>>>> @@ -227,6 +248,8 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
>>>>>    			break;
>>>>>    		msleep(1);
>>>>>    	} while (time_before(jiffies, end_jiffies));
>>>>> +	pci_info(parent, "lnksta %#06x new cls %#03x\n",
>>>>> +		lnksta, (cls & PCI_EXP_LNKSTA_CLS));
>>>>>    	return !(reg16 & PCI_EXP_LNKSTA_LT);
>>>>>    }
>>>>>    
>>>> Still exhibiting the BAR update error, run tested with next--20201030
>>> Yup, same for me :(
>> So then it is different issue and not similar to aardvark one.
>>
>> Anyway, was ASPM working on some previous kernel version? Or was it
>> always broken on Turris Omnia?
> I tried bisecting and couldn't find a commit that worked. And OpenWrt by
> default builds with ASPM off, so my best guess is that it was always
> broken.
>
> However, the two other PCI slots *do* work with ASPM on, as long as
> they're both occupied when booting. If I only have one card installed
> apart from the dodge WLE900, both of them fail...

Just to be sure it is not a (particular) mPCIe slot issue on the TO - 
did you change the device order in the mPCIe slots?

On my node:

- right slot (next to the CPU) hosts a SSD
- centre slot hosts WLE900VX
- left slot (over the SIM card slot) hosts the WLE200N2

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x729CFF47A416598B.asc
Type: application/pgp-keys
Size: 3119 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201102/b90446c3/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20201102/b90446c3/attachment-0001.sig>


More information about the linux-arm-kernel mailing list