[PATCH v2 1/4] PCI: dw-rockchip: Do not enumerate bus before endpoint devices are ready

Wed Jun 4 10:10:09 PDT 2025

On Wed, Jun 04, 2025 at 01:40:52PM +0200, Niklas Cassel wrote:
> On Tue, Jun 03, 2025 at 01:12:50PM -0500, Bjorn Helgaas wrote:
> > 
> > Hmmm, sorry, I misinterpreted both 1/4 and 2/4.  I read them as "add
> > this delay so the PLEXTOR device works", but in fact, I think in both
> > cases, the delay is actually to enforce the PCIe r6.0, sec 6.6.1,
> > requirement for software to wait 100ms before issuing a config
> > request, and the fact that it makes PLEXTOR work is a side effect of
> > that.
> 
> Well, the Plextor NVMe drive used to work with previous kernels,
> but regressed.
> 
> But yes, the delay was added to enforce "PCIe r6.0, sec 6.6.1"
> requirement for software to wait 100ms, which once again makes
> the Plextor NVMe drive work.
> 
> 
> > 
> > The beginning of that 100ms delay is "exit from Conventional Reset"
> > (ports that support <= 5.0 GT/s) or "link training completes" (ports
> > that support > 5.0 GT/s).
> > 
> > I think we lack that 100ms delay in dwc drivers in general.  The only
> > generic dwc delay is in dw_pcie_host_init() via the LINK_WAIT_SLEEP_MS
> > in dw_pcie_wait_for_link(), but that doesn't count because it's
> > *before* the link comes up.  We have to wait 100ms *after* exiting
> > Conventional Reset or completing link training.
> 
> In dw_pcie_wait_for_link(), in the first iteration of the loop, the link
> will never be up (because the link was just started),
> dw_pcie_wait_for_link() will then sleep for LINK_WAIT_SLEEP_MS (90 ms),
> before trying again.
> 
> Most likely the link training took way less than 100 ms, so most of those
> 90 ms will probably be after link training has completed.
> 
> That is most likely why Plextor worked on older kernels (which does not
> use the link up IRQ).
> 
> If we add a 100 ms sleep after wait_for_link(), then I suggest that we
> also reduce LINK_WAIT_SLEEP_MS to something shorter.
> 

No. The 900ms sleep is to make sure that we wait 1s before erroring out
assuming that the device is not present. This is mandated by the spec. So
irrespective of the delay we add *after* link up, we should try to detect the
link up for ~1s.

And for adding the delay, it should be done after the check for retry count:

diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index b3615d125942..92eb661babeb 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -700,6 +700,8 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
                return -ETIMEDOUT;
        }
 
+       msleep(PCIE_T_RRS_READY_MS);
+
        offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP);
        val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA);

> 
> > 
> > We don't know when the exit from Conventional Reset was, but it was
> > certainly before the link came up.  In the absence of a timestamp for
> > exit from reset, starting the wait after link-up is probably the best
> > we can do.  This could be either after dw_pcie_wait_for_link() finds
> > the link up or when we handle the link-up interrupt.
> > 
> > Patches 1 and 2 would fix the link-up interrupt case.  I think we need
> > another patch for the dwc core for dw_pcie_wait_for_link().
> 
> I agree, sounds like a plan.
> 
> 
> > 
> > I wish I'd had time to spend on this and include patches 1 and 2, but
> > we're up against the merge window wire and I'll be out the end of this
> > week, so I think they'll have to wait.  It seems like something we can
> > still justify for v6.16 though.
> 
> I think it sounds good to target this as fixes for v6.16.
> 

Yeah. Atleast patch 1 is fixing a regression, so it should be included for
v6.16.

- Mani

-- 
மணிவண்ணன் சதாசிவம்