phy: marvell: phy-mvebu-cp110-comphy: link failure and lockup built as module (=M)

Thu Oct 30 05:37:04 PDT 2025

Am 28.10.25 um 19:09 schrieb Josua Mayer:

> Am 27.10.25 um 03:06 schrieb Andrew Lunn:
>> On Sat, Oct 25, 2025 at 12:45:52PM +0000, Josua Mayer wrote:
>>> Dear Maintainers,
>>>
>>> I came across a bug srelating to cp110 comphy driver.
>>>
>>> On a board with CN9130 SoC + 2 external CPs Debian 13 freezes during boot,
>>> at some point after initramfs and kernel module loading has started.
>>>
>>> This occurs only when a pci card is present and had link-up from u-boot, e.g.:
>>>
>>> PCIE-0: Link up (Gen3-x4, Bus0)
>>> PCIE-12: Link up (Gen3-x1, Bus12)
>>>
>>> The issue is reproducible with a generic rootfs, kernel built with arm64 defconfig,
>>> no initramfs, but a single kernel configuration change:
>>>
>>> CONFIG_PHY_MVEBU_CP110_COMPHY=y -> m
>>>
>>> i.e. building the comphy driver as a module.
>>>
>>> The problem shows up usually by the console freezing during boot,
>>> before eventually the system watchdog hard resets SoC.
>> Do you know at what point the comphy driver module is loaded?
> By making it a module, loading is delayed till after rootfs mount in my case.
> That is after udev has started.
>> Do you get the same behaviour if the comphy module is not built/not
>> available?
> In this case the system does not lock up, but pci is not functional either.
>
> With CONFIG_DEBUG_DRIVER=y I can see that pci probe is pending,
> waiting for the phys.
>
>> I _guess_ that there is some missing EPROBE_DEFER code. It could be
>> when the PCIE code tries to get the PHY and fails, it just keeps
>> going, when in fact it needs to return EPROBE_DEFER, so that the core
>> will try again later, once the module has been loaded.
>>
>> Actually
>>
>> static int armada8k_pcie_setup_phys(struct armada8k_pcie *pcie)
>> {
>>         struct dw_pcie *pci = pcie->pci;
>>         struct device *dev = pci->dev;
>>         struct device_node *node = dev->of_node;
>>         int ret = 0;
>>         int i;
>>
>>         for (i = 0; i < ARMADA8K_PCIE_MAX_LANES; i++) {
>>                 pcie->phy[i] = devm_of_phy_get_by_index(dev, node, i);
>>                 if (IS_ERR(pcie->phy[i])) {
>>                         if (PTR_ERR(pcie->phy[i]) != -ENODEV)
>>                                 return PTR_ERR(pcie->phy[i]);
>>
>>                         pcie->phy[i] = NULL;
>>                         continue;
>>                 }
>>
>>                 pcie->phy_count++;
>>         }
>>
>> Do you see devm_of_phy_get_by_index() return -ENODEV? If i'm reading
>> this code correctly, it will just continue without the PHY.
> This code will return from probe with any error except ENODEV.
> So if get_phy returned EDEFER, probe should return EDEFER.
>
> From new boot-log with DEBUG_DRIVER=y, this is the first invocation of pci driver probe:
>
> [   23.091031] platform f2600000.pcie: bus: 'platform': __driver_probe_device: matched device with driver armada8k-pcie
> [   23.101921] platform f2600000.pcie: error -EPROBE_DEFER: wait for supplier /cp0-bus/bus at f2000000/phy at 120000/phy at 3
> [   23.112436] platform f2600000.pcie: Added to deferred list
>
> And it defers because of fourth lane phy.
> Actually that was before the clock driver probed ...
>
> [   24.491274] marvell-cp110-clock f2440000.system-controller:clock: bus: 'platform': really_probe: bound device to driver marvell-cp110-clock
>
> The pci clock is only disabled much later:
>
> [   40.676931] cp110-clk: disabling enabled clock "f2440000-pcie_x4"
>
> At that point pci probe had been deferred 5 times in total.
As a workaround I have successfully used clk_ignore_unused boot option ...
I believe that confirms common code stops the clock before pci driver can complete probe.

Is there any guidance how to deal with this situation?