[PATCH v3] PCI: dw-rockchip: Enable async probe by default

Tue Mar 10 22:24:59 PDT 2026

On Tue, Mar 10, 2026 at 09:03:32PM +0000, Robin Murphy wrote:
> [ +driver-core maintainers - async probe question below ]
> 
> On 2026-03-10 3:30 pm, Manivannan Sadhasivam wrote:
> > Hi Robin,
> > 
> > On Tue, Mar 10, 2026 at 01:41:56PM +0000, Robin Murphy wrote:
> > > Hi Mani,
> > > 
> > > On 2026-03-04 6:48 am, Manivannan Sadhasivam wrote:
> > > > 
> > > > On Thu, 26 Feb 2026 15:40:23 +0530, Anand Moon wrote:
> > > > > Rockchip DWC PCIe driver currently performs synchronous link training for
> > > > > combo PHYs (PCIe 3.0/2.0 and SATA 3.0) during boot. This process waits for
> > > > > the link to be fully established, adding several milliseconds to the boot
> > > > > sequence. To optimize boot time, this change enables asynchronous probing,
> > > > > allowing link establishment to proceed in the background while the kernel
> > > > > continues probing other devices.
> > > > > 
> > > > > [...]
> > > > 
> > > > Applied, thanks!
> > > > 
> > > > [1/1] PCI: dw-rockchip: Enable async probe by default
> > > >         commit: ec392abc95932838bf7e3d659d358f4df9ff5a0a
> > > 
> > > This appears to have the side-effect that calling pci_host_probe() from
> > > async context can effectively force async probe for the endpoint drivers
> > > as well, but some drivers are not OK with that, as our CI has just
> > > flagged up.
> > > 
> > 
> > Thanks for reporting!
> > 
> > > (And as for that particular warning, ISTR last time I looked into it
> > > another context, the opinion of the MDIO/phy maintainers seemed to be
> > > "don't force async probe".)
> > > 
> > 
> > This was discussed during v2 [1] and concluded that the async probe benefits
> > outweigh the unharmful splat from phylib. I also agree with the above conclusion
> > that this splat should not prevent us from enabling async probe for PCI
> > controller drivers. It can easily save a few 100ms during boot.
> 
> The problem is not an "unharmful splat" from one driver on one board -
> that's *a* symptom, and the fact that it happens to be a relatively benign
> one does not dismiss the problem that forcing async probe upon drivers that
> are not designed to support async probe cannot in general be assumed to be
> safe, so is not OK. It's one thing if a user brings it upon themselves by
> explicitly using the "driver_async_probe=" option, but it's very different
> if some other driver starts doing it for them.
> 

I have a contrary view here. If just a single driver or lib doesn't handle async
probe, it cannot just force other drivers to not take the advantage of async
probe. As I said above, enabling async probe easily saves a few hunderd ms or
even more if there are more than one Root Port or Root Complex in an SoC.

Moreover, there are multiple ways this splat could be triggered as reported in:
https://lore.kernel.org/netdev/7103704.9J7NaK4W3v@fedora.fritz.box

> Looking closer, it seems like the fundamental issue might be when we've got
> this far (simplified for clarity):
> 
> - async_run_entry_fn
>   - rockchip_pcie_probe
>     - pci_host_probe
>      - pci_bus_add_device
>        - device_initial_probe
>          - __device_attach_driver
> 
> wherein we then reach the "if (data->check_async && async_allowed !=
> data->want_async)" condition, at which point check_async is true (from
> device_initial_probe()), while async_allowed and want_async are *both*
> false, but that leads us to actually go ahead and call driver_probe_device()
> for the child device despite being in async context. That doesn't seem right
> to me - I'm guessing it maybe wasn't anticipated to have bus drivers calling
> device_initial_probe() from within async in the first place?
> 

But isn't the underlying issue is with phylib calling request_module() while the
driver core performs async probing?

> It may not strictly be the fault of this patch - clearly 91703041697c ("PCI:
> Allow built-in drivers to use async initial probing") is implicated in this
> too - but the fact is that it *has* exposed a bug that needs fixing one way
> or another, it can't just be left hanging and impacting end users.
> 

I strongly agree with you here that the underlying issue should be fixed. But
the real impact to end users is not this splat, but not having the boot time
optimization that this patch brings in. As an end user, one would want their
systems to boot quickly and they wouldn't bother much about a harmless warning
splat appearing in the dmesg log.

- Mani

-- 
மணிவண்ணன் சதாசிவம்