PCI trouble on mvebu (Turris Omnia)

Bjorn Helgaas helgaas at kernel.org
Wed Oct 28 10:42:09 EDT 2020


On Wed, Oct 28, 2020 at 02:36:13PM +0100, Toke Høiland-Jørgensen wrote:
> Toke Høiland-Jørgensen <toke at redhat.com> writes:
> 
> > Bjorn Helgaas <helgaas at kernel.org> writes:
> >
> >> [+cc vtolkm]
> >>
> >> On Tue, Oct 27, 2020 at 04:43:20PM +0100, Toke Høiland-Jørgensen wrote:
> >>> Hi everyone
> >>> 
> >>> I'm trying to get a mainline kernel to run on my Turris Omnia, and am
> >>> having some trouble getting the PCI bus to work correctly. Specifically,
> >>> I'm running a 5.10-rc1 kernel (torvalds/master as of this moment), with
> >>> the resource request fix[0] applied on top.
> >>> 
> >>> The kernel boots fine, and the patch in [0] makes the PCI devices show
> >>> up. But I'm still getting initialisation errors like these:
> >>> 
> >>> [    1.632709] pci 0000:01:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
> >>> [    1.632714] pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >>> [    1.632745] pci 0000:02:00.0: BAR 0: error updating (0xe0200004 != 0xffffffff)
> >>> [    1.632750] pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> >>> 
> >>> and the WiFi drivers fail to initialise with what appears to me to be
> >>> errors related to the bus rather than to the drivers themselves:
> >>> 
> >>> [    3.509878] ath: phy0: Mac Chip Rev 0xfffc0.f is not supported by this driver
> >>> [    3.517049] ath: phy0: Unable to initialize hardware; initialization status: -95
> >>> [    3.524473] ath9k 0000:01:00.0: Failed to initialize device
> >>> [    3.530081] ath9k: probe of 0000:01:00.0 failed with error -95
> >>> [    3.536012] ath10k_pci 0000:02:00.0: of_irq_parse_pci: failed with rc=134
> >>> [    3.543049] pci 0000:00:02.0: enabling device (0140 -> 0142)
> >>> [    3.548735] ath10k_pci 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> >>> [    3.588592] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> >>> [    3.595098] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> >>> 
> >>> lspci looks OK, though:
> >>> 
> >>> # lspci
> >>> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 00:03.0 PCI bridge: Marvell Technology Group Ltd. Device 6820 (rev 04)
> >>> 01:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
> >>> 02:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter (rev ff)
> >>> 
> >>> Does anyone have any clue what could be going on here? Is this a bug, or
> >>> did I miss something in my config or other initialisation? I've tried
> >>> with both the stock u-boot distributed with the board, and with an
> >>> upstream u-boot from latest master; doesn't seem to make any different.
> >>
> >> Can you try turning off CONFIG_PCIEASPM?  We had a similar recent
> >> report at https://bugzilla.kernel.org/show_bug.cgi?id=209833 but I
> >> don't think we have a fix yet.
> >
> > Yes! Turning that off does indeed help! Thanks a bunch :)
> >
> > You mention that bisecting this would be helpful - I can try that
> > tomorrow; any idea when this was last working?
> 
> OK, so I tried to bisect this, but, erm, I couldn't find a working
> revision to start from? I went all the way back to 4.10 (which is the
> first version to include the device tree file for the Omnia), and even
> on that, the wireless cards were failing to initialise with ASPM
> enabled...

I have no personal experience with this device; all I know is that the
bugzilla suggests that it worked in v5.4, which isn't much help.

Possibly the apparent regression was really a .config change, i.e.,
CONFIG_PCIEASPM was disabled in the v5.4 kernel vtolkm@ tested and it
"worked" but got enabled later and it started failing?

Maybe the debug patch below would be worth trying to see if it makes
any difference?  If it *does* help, try omitting the first hunk to see
if we just need to apply the quirk_enable_clear_retrain_link() quirk.

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..afe7fa1d54d6 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -103,7 +103,7 @@ static const char *policy_str[] = {
 	[POLICY_POWER_SUPERSAVE] = "powersupersave"
 };
 
-#define LINK_RETRAIN_TIMEOUT HZ
+#define LINK_RETRAIN_TIMEOUT (10*HZ)
 
 static int policy_to_aspm_state(struct pcie_link_state *link)
 {
@@ -201,7 +201,7 @@ static bool pcie_retrain_link(struct pcie_link_state *link)
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
 	reg16 |= PCI_EXP_LNKCTL_RL;
 	pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-	if (parent->clear_retrain_link) {
+	if (1 || parent->clear_retrain_link) {
 		/*
 		 * Due to an erratum in some devices the Retrain Link bit
 		 * needs to be cleared again manually to allow the link



More information about the linux-arm-kernel mailing list