PCI trouble on mvebu (Turris Omnia)

Pali Rohár pali at kernel.org
Fri Oct 30 06:08:07 EDT 2020


Hello!

On Thursday 29 October 2020 21:58:53 Marek Behun wrote:
> On Thu, 29 Oct 2020 14:30:22 -0500
> Bjorn Helgaas <helgaas at kernel.org> wrote:
> 
> > On Thu, Oct 29, 2020 at 12:12:21PM +0100, Toke Høiland-Jørgensen wrote:
> > > Pali Rohár <pali at kernel.org> writes:  
> > 
> > > > I have been testing mainline kernel on Turris Omnia with two PCIe
> > > > default cards (WLE200 and WLE900) and it worked fine. But I do not know
> > > > if I had ASPM enabled or not.
> > > >
> > > > So it is working fine for you when CONFIG_PCIEASPM is disabled and whole
> > > > issue is only when CONFIG_PCIEASPM is enabled?  
> > > 
> > > Yup, exactly. And I'm also currently testing with the default WLE200/900
> > > cards... I just tried sticking an MT76-based WiFi card into the third
> > > PCI slot, and that doesn't come up either when I enable PCIEASPM.  
> > 
> > Huh.  So IIUC, the following cases all try to retrain the link and it
> > fails to come up again:
> > 
> >   - aardvark + WLE900VX (see commit 43fc679ced18)

Just to note: aardvark + WLE200 worked fine whatever I did. No
workaround and no patch was needed.

> >   - mvebu + WLE200
> >   - mvebu + WLE900
> >   - mvebu + MT76
> 
> Bjorn, IIRC Pali's patches fix the WLE900VX card for Aardvark (both in
> kernel and in U-Boot).
> IMO mvebu has similar issues. Both these drivers handle the PCIe reset
> signal incorrectly (or at least Aardvark did before Pali's work).
> 
> mvebu is used on Turris Omnia, and our HW guys first solved the WLE900VX
> not working issue by using different capacitors for the SerDeses (this
> was 5 years ago). But after Pali's work on Aardvark I think this could
> also be solved for mvebu driver in software.

Apparently not :-( See below, we cannot control PERST# pin from software
on Turris Omnia.

> BTW the WLE900VX card has problems on many systems, it won't work for
> example on Thinkpad X230. There is a bug on kernel bugzilla reported
> for this.

WLE900VX is really buggy card. During its initialization/reset
W_DISABLE# (pin 20) must be in correct state, otherwise system would
never see this card. This is reason why it does not work in laptops,
sometimes could help double reboot and playing with rfkill state prior
reboot. See reported issue:

https://bugzilla.kernel.org/show_bug.cgi?id=84821#c53

> My opinion is that many drivers do not respect the PCIe specification
> for reset and link training totally correctly (Pali was talking about
> this when he was looking at Aardvark) and that WLE900VX has a bug that
> in combination with those drivers causes the fail. If you look at the
> drivers, they are incompatible in how they handle the reset signal and
> link training.

Seems that aardvark or WLE900VX card (not only this one, but basically
every ath10k tested card, also non-Compex) have problems that when
booting Linux kernel they are in some totally strange state and whatever
I did I was not able to detect them and make link training success. The
only thing which helped was to issue card reset via out of band PERST#
signal.

And here is the main issue with PERST# signal on linux kernel. Basically
every driver issue card reset via PERST# signal for different amount of
time. Something which must be driver and card independent, probably
already documented in PCIe specification. See my email:

https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/

I was trying to find that minimal reset timeout in specifications, but I
was not able to understand all those details and timeouts defined in
different diagrams. I'm not HW guy. See what was I able to find out:

https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/

And my conclusion is here:

https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/

So to finally fix issues with card reset we need somebody who understand
hardware documents and PCIe specifications and can figure out what is
the correct minimal value of delay needed for proper card reset via
PERST# signal. And then fix all PCI controller drivers to use this
value.

In aardvark we have timeout which was enough for my tested cards on
Espressobin and Turris MOX.


And second issue is with link training. What helped me to finally fix
link training for PCIe cards on A3720 with aardvark driver in both
U-Boot and Linux kernel was comment in following commit:

https://git.kernel.org/linus/f4c7d053d7f7

    As required by PCI Express spec a delay for at least 100ms after
    such a reset [fundamental reset by asserted PERST# signal] before
    link training is needed.

In aardvark control register I forcibly disabled link training bit prior
issuing reset via PERST# signal and then I re-enabled it 100ms after
reset was completed.

I have sent aardvark patch which update comment for above requirement:
https://lore.kernel.org/linux-pci/20200924084618.12442-1-pali@kernel.org/

> I am curious what Pali will tell us, he said that he will look into the
> mvebu driver.

If same problem with WLE900 cards is also on A38x SOC (with pci-mvebu
driver) then it would be hard to fix it on Turris Omnia.

On Turris MOX (with aardvark) PERST# pin from card is connected to some
MPP pin on A3720 SOC, which we can control via GPIO. In DTS we have
configured it as "reset-gpios" and therefore aardvark driver can
assert/deassert PERST# for card when needed.

On Turris Omnia (with pci-mvebu) PERST# pin from wifi card is connected
to MCU and it asserts/deasserts this pin only after board reset. Also it
is shared line across all mPCIe slots and also with other peripherals.

So we cannot issue reset via PERST# signal on Turris Omnia. But there
are other ways how to issue fundamental reset, via in band signaling.

But IIRC issuing fundamental reset via in band PCIe bus is done via PCIe
bridge to which is card connected. So second problem, we do not have
PCIe bridge on mvebu platforms, it is just emulated via kernel. Unless
there is some "special" register for issuing fundamental reset we would
not be able to emulate this reset.

Aardvark does not have PCIe bridge too, but in its internal registers
are bits for different types of reset. And when I was trying to use them
nothing happened, nothing helped. Only external reset via PERST# signal
was able to initialize card.

I will look into A38x PCI registers if there is not something which
could help us. But without access to PERST# pin I'm sceptical if we can
do something... Only just hoping that in PCIe ASPM retraining code is a
bug which can be fixed...



More information about the linux-arm-kernel mailing list