[PATCH net 4/4] net: axienet: Split into MAC and MDIO drivers

Thu Jul 10 16:37:28 PDT 2025

Hi Andrew,

On 6/23/25 19:16, Sean Anderson wrote:
> On 6/23/25 18:45, Andrew Lunn wrote:
>> On Mon, Jun 23, 2025 at 02:48:53PM -0400, Sean Anderson wrote:
>>> On 6/23/25 14:27, Andrew Lunn wrote:
>>> > On Mon, Jun 23, 2025 at 11:16:08AM -0400, Sean Anderson wrote:
>>> >> On 6/21/25 03:33, Andrew Lunn wrote:
>>> >> > On Thu, Jun 19, 2025 at 04:05:37PM -0400, Sean Anderson wrote:
>>> >> >> Returning EPROBE_DEFER after probing a bus may result in an infinite
>>> >> >> probe loop if the EPROBE_DEFER error is never resolved.
>>> >> > 
>>> >> > That sounds like a core problem. I also thought there was a time
>>> >> > limit, how long the system will repeat probes for drivers which defer.
>>> >> > 
>>> >> > This seems like the wrong fix to me.
>>> >> 
>>> >> I agree. My first attempt to fix this did so by ignoring deferred probes
>>> >> from child devices, which would prevent "recursive" loops like this one
>>> >> [1]. But I was informed that failing with EPROBE_DEFER after creating a
>>> >> bus was not allowed at all, hence this patch.
>>> > 
>>> > O.K. So why not change the order so that you know you have all the
>>> > needed dependencies before registering the MDIO bus?
>>> > 
>>> > Quoting your previous email:
>>> > 
>>> >> Returning EPROBE_DEFER after probing a bus may result in an infinite
>>> >> probe loop if the EPROBE_DEFER error is never resolved. For example,
>>> >> if the PCS is located on another MDIO bus and that MDIO bus is
>>> >> missing its driver then we will always return EPROBE_DEFER.
>>> > 
>>> > Why not get a reference on the PCS device before registering the MDIO
>>> > bus?
>>> 
>>> Because the PCS may be on the MDIO bus. This is probably the most-common
>>> case.
>> 
>> So you are saying the PCS is physically there, but the driver is
>> missing because of configuration errors? Then it sounds like a kconfig
>> issue?
>> 
>> Or are you saying the driver has been built but then removed from
>> /lib/modules/
> 
> The latter. Or maybe someone just forgot to install it (or include it
> with their initramfs). Or maybe there was some error with the MDIO bus.
> 
> There are two mutually-exclusive scenarios (that can both occur in the
> same system). First, the PCS can be attached to our own MDIO bus:
> 
> MAC
>  |
>  +->MDIO
>      |
>      +->PCS
>      +->PHY (etc)
> 
> In this scenario, we have to probe the MDIO bus before we can look up
> the PCS, since otherwise the PCS will always be missing when we look for
> it. But if we do things in the right order then we can't get
> EPROBE_DEFER, and so there's no risk of a probe loop.
> 
> Second, the PCS can be attached to some other MDIO bus:
> 
> MAC              MDIO
>  |                 |
>  +->MDIO           +->PCS
>       |
>       +->PHY (etc)
> 
> In this scenario, the MDIO bus might not be present for whatever reason
> and we have the possibility of an EPROBE_DEFER error. If that happens,
> we will end up in a probe loop because the PHY on the MDIO bus
> incremented deferred_trigger_count when it probed successfully:
> 
> deferred_probe_work_func()
>   driver_probe_device(MAC)
>     axienet_probe(MAC)
>       mdiobus_register(MDIO)
>         device_add(PHY)
>           (probe successful)
>           driver_bound(PHY)
>             driver_deferred_probe_trigger()
>       return -EPROBE_DEFER
>     driver_deferred_probe_add(MAC)
>     // deferred_trigger_count changed, so...
>     driver_deferred_probe_trigger()

Does the above scenario make sense? As I see it, the only approaches are

- Modify the driver core to detect and mitigate this sort of scenario
  (NACKed by Greg).
- Split the driver into MAC and MDIO parts (this patch).
- Modify phylink to allow connecting a PCS after phylink_create but
  before phylink_start. This is tricky because the PCS can affect the
  supported phy interfaces, and phy interfaces are validated in
  phylink_create.
- Defer phylink_create to ndo_open. This means that all the
  netdev/ethtool ops that use phylink now need to check ip the netdev is
  open and fall back to some other implementation. I don't think we can
  just return -EINVAL or whatever because using ethtool on a down device
  has historically worked. I am wary of breaking userspace because some
  tool assumes it can get_ksettings while the netdev is down.

Do you see any other options? IMO, aside from the first option, the
second one has the best UX. With the latter two, you could have a netdev
that never comes up and the user may not have very good insight as to
why. E.g. it may not be obvious that the user should try to bring the
netdev up again after the PCS is probed. By waiting to create the netdev
until after we successfully probe the PCS we show up in
devices_deferred and the netdev can be brought up as usual.

--Sean