imx8mp pci hang during init

Tim Harvey tharvey at gateworks.com
Fri Aug 18 15:12:26 PDT 2023


On Thu, Aug 17, 2023 at 5:05 PM Maciej W. Rozycki <macro at orcam.me.uk> wrote:
>
> On Thu, 17 Aug 2023, Bjorn Helgaas wrote:
>
> > [+cc Maciej, smells similar to a89c82249c37 ("PCI: Work around PCIe
> > link training failures") ]
>
>  Quite so indeed.
>
> > > [ 0.499660] imx6q-pcie 33800000.pcie: host bridge /soc at 0/pcie at 33800000 ranges:
> > > [ 0.500276] clk: Not disabling unused clocks
> > > [ 0.506960] imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff ->
> > > 0x0000000000
> > > [ 0.519401] imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff
> > > -> 0x0018000000
> > > [ 0.743554] imx6q-pcie 33800000.pcie: iATU: unroll T, 4 ob, 4 ib,
> > > align 64K, limit 16G
> > > [ 0.851578] imx6q-pcie 33800000.pcie: PCIe Gen.1 x1 link up
> > > ^^^ hang at this point until watchdog resets
>

Maciej and Bjorn,

Thank you for the responses!

>  So I think it's important to figure out where exactly in the kernel code
> the hang happens; this is presumably in host-bridge-specific link bring-up
> code polling link status, which may have to be updated according to or
> otherwise make use of a89c82249c37.  It may also be something completely
> different of course.
>

It's hanging in imx6_pcie_start_link() after the PCI_EXP_LNKCAP
register is updated to allow gen2 during the subsequent
dw_pcie_wait_for_link, specifically within the dw_pcie_read_dbi
function that does a memory read. Due to the mem read hanging the CPU
this tells me that the DWC core has crashed at this point.

What I found is that if I essentially revert the effect of commit
fa33a6d87eac ("PCI: imx6: Start link in Gen1 before negotiating for
Gen2 mode") to start linking at gen3 (or forced to gen2) it appears to
downgrade to gen2 (due to the PI7C9X2G608GPB being a gen2 switch) and
work fine:
diff --git a/drivers/pci/controller/dwc/pci-imx6.c
b/drivers/pci/controller/dwc/pci-imx6.c
index 27aaa2a6bf39..81caaef76e8a 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -876,6 +876,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        u32 tmp;
        int ret;

+#if 0
        /*
         * Force Gen1 operation when starting the link.  In case the link is
         * started in Gen2 mode, there is a possibility the devices on the
@@ -887,6 +888,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        tmp |= PCI_EXP_LNKCAP_SLS_2_5GB;
        dw_pcie_writel_dbi(pci, offset + PCI_EXP_LNKCAP, tmp);
        dw_pcie_dbi_ro_wr_dis(pci);
+#endif

        /* Start LTSSM. */
        imx6_pcie_ltssm_enable(dev);
@@ -895,6 +897,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        if (ret)
                goto err_reset_phy;

+#if 0
        if (pci->link_gen > 1) {
                /* Allow faster modes after the link is up */
                dw_pcie_dbi_ro_wr_en(pci);
@@ -937,6 +940,7 @@ static int imx6_pcie_start_link(struct dw_pcie *pci)
        } else {
                dev_info(dev, "Link: Only Gen1 is enabled\n");
        }
+#endif

        imx6_pcie->link_is_up = true;
        tmp = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA);

So I think you are correct in that I need to do the same thing that
was done in a89c82249c37 for the imx6 dwc driver which essentially
forces it the other way going from gen1->gen2->gen3 (upward) instead
of gen3->gen2->gen1 (downard).

I've cc'd Marek who authored commit fa33a6d87eac ("PCI: imx6: Start
link in Gen1 before negotiating for Gen2 mode") in hopes that he might
remember what switch or switches he needed this change for. I'm not
even clear what IMX6 SoC 10 years ago even had gen2 capability.

I found that the PI7C9X2G608GPB used here has an errata "E11: GEN2
Change-Rate Issue with Certain Root Complex Platforms" that describes
an issue observed in certain PCIe gen3 platforms during a rate change
from 2.5Gbps to 5Gbps caused by the switch entering a recovery state
that can timeout at which point according to the errata "After the
link-down process, all the registers are reset to the default values"
which is likely whats causing the DWC controller to hang.

My gut feel is that commit a89c82249c37 ("PCI: Work around PCIe link
training failures") likely would resolve the issues that Marek had
which prompted him to make the imx6 driver go from gen1 upward and
that if we changed the driver to go from gen3 downward it would
resolve my issue as well. However, I don't know what the 'correct'
link training sequence should really be (upward or downward) so it's
hard to say what the right workaround is. Is there a correct link
training sequence and if so how many controllers are using it vs
having to reverse it to workaround hardware quirks?

>  Can you see if you can bump the link up beyond 2.5GT/s by poking at host
> bridge registers by hand with `setpci' once the link been successfully
> established at 2.5GT/s?
>

I'll have to try that. Instead of using the PCI_EXP_LNKCTL like the
pcie_retrain_link() function does the imx6 driver touches some DWC
register that I don't have documentation for so essentially what your
asking will test retraining the more standard way using the config
registers:
                /*
                 * Start Directed Speed Change so the best possible
                 * speed both link partners support can be negotiated.
                 */
                tmp = dw_pcie_readl_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL);
                tmp |= PORT_LOGIC_SPEED_CHANGE;
                dw_pcie_writel_dbi(pci, PCIE_LINK_WIDTH_SPEED_CONTROL, tmp);
                dw_pcie_dbi_ro_wr_dis(pci);

Best regards,

Tim



More information about the linux-arm-kernel mailing list