phy: marvell: phy-mvebu-cp110-comphy: link failure and lockup built as module (=M)

Tue Oct 28 11:09:41 PDT 2025

Am 27.10.25 um 03:06 schrieb Andrew Lunn:
> On Sat, Oct 25, 2025 at 12:45:52PM +0000, Josua Mayer wrote:
>> Dear Maintainers,
>>
>> I came across a bug srelating to cp110 comphy driver.
>>
>> On a board with CN9130 SoC + 2 external CPs Debian 13 freezes during boot,
>> at some point after initramfs and kernel module loading has started.
>>
>> This occurs only when a pci card is present and had link-up from u-boot, e.g.:
>>
>> PCIE-0: Link up (Gen3-x4, Bus0)
>> PCIE-12: Link up (Gen3-x1, Bus12)
>>
>> The issue is reproducible with a generic rootfs, kernel built with arm64 defconfig,
>> no initramfs, but a single kernel configuration change:
>>
>> CONFIG_PHY_MVEBU_CP110_COMPHY=y -> m
>>
>> i.e. building the comphy driver as a module.
>>
>> The problem shows up usually by the console freezing during boot,
>> before eventually the system watchdog hard resets SoC.
> Do you know at what point the comphy driver module is loaded?
By making it a module, loading is delayed till after rootfs mount in my case.
That is after udev has started.
>
> Do you get the same behaviour if the comphy module is not built/not
> available?

In this case the system does not lock up, but pci is not functional either.

With CONFIG_DEBUG_DRIVER=y I can see that pci probe is pending,
waiting for the phys.

>
> I _guess_ that there is some missing EPROBE_DEFER code. It could be
> when the PCIE code tries to get the PHY and fails, it just keeps
> going, when in fact it needs to return EPROBE_DEFER, so that the core
> will try again later, once the module has been loaded.
>
> Actually
>
> static int armada8k_pcie_setup_phys(struct armada8k_pcie *pcie)
> {
>         struct dw_pcie *pci = pcie->pci;
>         struct device *dev = pci->dev;
>         struct device_node *node = dev->of_node;
>         int ret = 0;
>         int i;
>
>         for (i = 0; i < ARMADA8K_PCIE_MAX_LANES; i++) {
>                 pcie->phy[i] = devm_of_phy_get_by_index(dev, node, i);
>                 if (IS_ERR(pcie->phy[i])) {
>                         if (PTR_ERR(pcie->phy[i]) != -ENODEV)
>                                 return PTR_ERR(pcie->phy[i]);
>
>                         pcie->phy[i] = NULL;
>                         continue;
>                 }
>
>                 pcie->phy_count++;
>         }
>
> Do you see devm_of_phy_get_by_index() return -ENODEV? If i'm reading
> this code correctly, it will just continue without the PHY.
This code will return from probe with any error except ENODEV.
So if get_phy returned EDEFER, probe should return EDEFER.

From new boot-log with DEBUG_DRIVER=y, this is the first invocation of pci driver probe:

[   23.091031] platform f2600000.pcie: bus: 'platform': __driver_probe_device: matched device with driver armada8k-pcie
[   23.101921] platform f2600000.pcie: error -EPROBE_DEFER: wait for supplier /cp0-bus/bus at f2000000/phy at 120000/phy at 3
[   23.112436] platform f2600000.pcie: Added to deferred list

And it defers because of fourth lane phy.
Actually that was before the clock driver probed ...

[   24.491274] marvell-cp110-clock f2440000.system-controller:clock: bus: 'platform': really_probe: bound device to driver marvell-cp110-clock

The pci clock is only disabled much later:

[   40.676931] cp110-clk: disabling enabled clock "f2440000-pcie_x4"

At that point pci probe had been deferred 5 times in total.

sincerely
Josua Mayer