[PATCH v2 0/4] PCI: dwc: Link Up IRQ fixes

Laszlo Fiat laszlo.fiat at proton.me
Thu May 15 10:33:41 PDT 2025


Hello,

On Tuesday, May 13th, 2025 at 4:07 PM, Niklas Cassel <cassel at kernel.org> wrote:

> Hello Mani,
> 
> On Tue, May 13, 2025 at 11:53:29AM +0100, Manivannan Sadhasivam wrote:
> 
> > This wait time is a grey area in the spec tbh. If the Readiness Notification
> > (RN) is not supported, then the spec suggests waiting 1s for the device to
> > become 'configuration ready'. That's why we have the 1s delay in dwc driver.
> > 
> > Also, it has the below in r6.0, sec 6.6.1:
> > 
> > `* On the completion of Link Training (entering the DL_Active state, see § Section 3.2 ), a component must be able to receive and process TLPs and DLLPs. * Following exit from a Conventional Reset of a device, within 1.0 s the device must be able to receive a Configuration Request and return a Successful Completion if the Request is valid. This period is independent of how quickly Link training completes. If Readiness Notifications mechanisms are used (see § Section 6.22 .), this period may be shorter.`
> > 
> > As per the first note, once link training is completed, the device should be
> > ready to accept configuration requests from the host. So no delay should be
> > required.
> > 
> > But the second note says that the 1s delay is independent of how quickly the
> > link training completes. This essentially contradicts with the above point.
> > 
> > So I think it is not required to add delay after completing the LTSSM, unless
> > someone sees any issue.
> 
> 
> If you look at the commit message in patch 1/2, the whole reason for this
> series is that someone has seen an issue :)
> 
> While I personally haven't seen any issue, the user reporting that commit
> ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we can detect
> Link Up") regressed his system so that it can no longer mount rootfs
> (which is on a PLEXTOR PX-256M8PeGN NVMe SSD) clearly has seen an issue.
> 
> It is possible that his device is not following the spec.
> I simply compared the code before and after ec9fd499b9c6, to try to
> figure out why it was actually working before, and came up with this,
> which made his device functional again.
> 
> Perhaps we should add a comment above the sleep that says that this
> should strictly not be needed as per the spec?
> (And also add the same comment in the (single) controller driver in
> mainline which already does an msleep(PCIE_T_RRS_READY_MS).)

I am the one experiencing the issue with my Orange PI 3B (RK3566, 8 GB RAM) and a PLEXTOR PX-256M8PeGN NVMe SSD. 

I first detected the problem while upgrading from 6.13.8 to 6.14.3, that my system cannot find the NVME SSD which contains the rootfs. After reverting the two patches:

- ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we can detect Link Up")
- 0e0b45ab5d77 ("PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ")

my system booted fine again. 
After that I tested the patches sent by Niklas in this thread, which fixed the issue, so I sent Tested-by.

I did another test Today with 6.15.0-rc6, which in itself does not find my SSD. Niklas asked me to test with these 

- revert ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we can detect Link Up")
- revert 0e0b45ab5d77 ("PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ")
- apply the following patch:

diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index b3615d125942..5dee689ecd95 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -692,7 +692,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
                if (dw_pcie_link_up(pci))
                        break;

-               msleep(LINK_WAIT_SLEEP_MS);
+               usleep_range(100, 200);
        }

        if (retries >= LINK_WAIT_MAX_RETRIES) {


which restores the original behaviour to wait for link-up, then shorten the time. This resulted again a non booting system, this time with "Phy link never came up" error message.
So please allow to fix the regression that is already in 6.14.x. I now so far only I have reported this, but we cannot be sure how many SSDs have this timing issue. Most users use older, distribution packaged kernels, so others will face this later.

Bye,

Laszlo Fiat

> 
> 
> Kind regards,
> Niklas



More information about the Linux-rockchip mailing list