[PATCH v2 1/4] PCI: dw-rockchip: Do not enumerate bus before endpoint devices are ready

Niklas Cassel cassel at kernel.org
Fri May 30 06:57:14 PDT 2025


On Wed, May 28, 2025 at 05:42:51PM -0500, Bjorn Helgaas wrote:
> On Tue, May 06, 2025 at 09:39:36AM +0200, Niklas Cassel wrote:
> > Commit ec9fd499b9c6 ("PCI: dw-rockchip: Don't wait for link since we can
> > detect Link Up") changed so that we no longer call dw_pcie_wait_for_link(),
> > and instead enumerate the bus when receiving a Link Up IRQ.
> > 
> > Laszlo Fiat reported (off-list) that his PLEXTOR PX-256M8PeGN NVMe SSD is
> > no longer functional, and simply reverting commit ec9fd499b9c6 ("PCI:
> > dw-rockchip: Don't wait for link since we can detect Link Up") makes his
> > SSD functional again.
> > 
> > It seems that we are enumerating the bus before the endpoint is ready.
> > Adding a msleep(PCIE_T_RRS_READY_MS) before enumerating the bus in the
> > threaded IRQ handler makes the SSD functional once again.
> 
> This sounds like a problem that could happen with any controller, not
> just dw-rockchip?

Correct.


> Are we missing some required delay that should be
> in generic code?  Or is this a PLEXTOR defect that everybody has to
> pay the price for?

So far, the Plextor drive is the only endpoint that we know of, which is
not working without the delay.

We have no idea if this Plextor drive is the only bad behaving endpoint or
if there are many endpoints with similar issues, because before the
use_linkup_irq() callback was introduced, we always had (the equivalent of)
this delay.

Since this will only delay once the link up IRQ is triggered, it will not
affect the boot time when there are no endpoint plugged in to the slot, so
it seemed quite harmless to reintroduce this delay before enumeration.

But other suggestions are of course welcome.


Since it seems that we can read the PCI vendor and PCI device ID, it seems
that at least some config space reads seem to work, so I guess that we could
try to quirk the PCI vendor and PCI device ID in the nvme driver.

The nvme driver does have a NVME_QUIRK_DELAY_BEFORE_CHK_RDY quirk for some
Samsung drive, perhaps we could try something similar to the Plextor drive?

I don't personally have this problematic NVMe drive, so I am not able to test
such a patch. The user who reported the problem, Laszlo, has been doing all
the testing.


Perhaps Laszlo could try something like:


diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6b04473c0ab7..9c409af34548 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2865,6 +2865,12 @@ static const struct nvme_core_quirk_entry core_quirks[] = {
                .quirks = NVME_QUIRK_DELAY_BEFORE_CHK_RDY |
                          NVME_QUIRK_NO_DEEPEST_PS |
                          NVME_QUIRK_IGNORE_DEV_SUBNQN,
+       },
+       {
+
+               .vid = 0x144d,
+               .mn = "Samsung Portable SSD X5",
+               .quirks = NVME_QUIRK_DELAY_BEFORE_CHK_RDY,
        }
 };


.. with the .vid and .mn fields replacd with the correct ones for the Plextor
drive. (Don't forget to revert patch in $subject when testing this alternate
solution.)

I don't have a preference for either solution.


Kind regards,
Niklas



More information about the Linux-rockchip mailing list