[net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling

Mon May 19 11:10:12 PDT 2025

On 5/13/25 23:00, Daniel Golle wrote:
> On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
>> On 5/13/25 15:03, Daniel Golle wrote:
>> > just instead of having many
>> > more or less identical implementations of .mac_select_pcs, this
>> > functionality is moved into phylink. As a nice side-effect that also
>> > makes managing the life-cycle of the PCS more easy, so we won't need all
>> > the wrappers for all the PCS OPs.
>> 
>> I think the wrapper approach is very obviously correct. This way has me
>> worried about exciting new concurrency bugs.
> 
> You may not be surprised to read that this was also our starting point 2
> months ago, I had implemented support for standalone PCS very similar to
> the approach you have published now, using refcnt'ed instances and
> locked wrapper functions for all OPs. My approach, like yours, was to
> create a new subsystem for standalone PCS drivers which is orthogonal to
> phylink and only requires very few very small changes to phylink itself.
> It was a draft and not as complete and well-documented like your series
> now, of course.
> 
> I've then shared that implementation with Christian and some other
> experienced OpenWrt developers and we concluded that having phylink handle
> the PCS lifecycle and PCS selection would be the better and more elegant
> approach for multiple reasons:
>  - The lifetime management of the wrapper instances becomes tricky:
>    We would either have to live with them being allocated by the
>    MAC-driver (imagine test-case doing unbind and then bind in a loop
>    for a while -- we would end up oom). Or we need some kind of garbage
>    collecting mechanism which frees the wrapper once refcnt is zero --
>    and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
>    'put' equivalent (eg. a .pcs_destroy() OP) in phylink.
> 
>    Russell repeatedly pointed me to the possibility of a PCS
>    "disappearing" (and potentially "reappearing" some time later), and
>    in this case it is unclear who would then ever call pcs_put(), or
>    even notify the Ethernet driver or phylink about the PCS now being
>    available (again). Using device_link_add(), like it is done in
>    pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
>    impacts all other netdevs exposed by the same Ethernet driver
>    instance, and has a few other rather ugly implications.

SRCU neatly solves the lifetime management issues. The wrapper lives as
long as anyone (provider or user) holds a reference. A PCS can disappear
at any point and everything still works (although the link goes down).
Device links are only an optimization; they cannot be relied on for
correctness.

>  - phylink currently expects .mac_select_pcs to never fail. But we may
>    need a mechanism similar to probe deferral in case the PCS is not
>    yet available.

Which is why you grab the PCS in probe. If you want to be more dynamic,
you can do it in netdev open like is done for PHYs.

>    Your series partially solves this in patch 11/11 "of: property: Add
>    device link support for PCS", but also that still won't make the link
>    come back in case of a PCS showing up late to the party, eg. due to
>    constraints such as phy drivers (drivers/phy, not drivers/net/phy)
>    waiting for nvmem providers, or PCS instances "going away" and
>    "coming back" later.

This all works correctly due to device links. The only case that doesn't
work automatically is something like

MAC built-in
  MDIO built-in
    PCS module

where the PCS module gets loaded late. In that case you have to manually
re-probe the MAC. I think the best way to address this would be to grab
the PCS in netdev open so that the MAC can probe without the PCS.

>  - removal of a PCS instance (eg. via sysfs unbind) would still
>    require changes to phylink. there is no phylink function to
>    impair the link in this case, and using dev_close() is a bit ugly,
>    and also won't bring the link back up once the PCS (re-)appears.

This works just fine. There are two cases:

- If the PCS has an IRQ, we notify phylink and then it polls the PCS
  (see below).
- If the PCS is polled, phylink will call pcs_get_state and see that the
  link is down.

Either way, the link goes down. But bringing the link back up is pretty
unusual anyway. Unlike PHYs (which theoretically can be on removable
busses) PCSs are generally permanently attached to their MACs. The only
removable scenario I can think of is if the PCS is on an FPGA and the
MAC is not.

So if the PCS goes away, the MAC is likely to follow shortly after
(since the whole thing is on a removable bus). Or someone has manually
removed the PCS, in which case I think it's reasonable to have them
manually remove the MAC as well. If you really want to support this,
then just grab the PCS in netdev open.

>  - phylink anyway is the only user of PCS drivers, and will very likely
>    always be. So why create another subsystem?

To avoid adding overhead for the majority of PCSs where the PCS is built
into the MAC and literally can't be removed. We only pay the price for
dynamicism on the drivers where it matters.

> All that being said I also see potential problems with Christians
> current implementation as it doesn't prevent the Ethernet driver to
> still store a pointer to struct phylink_pcs (returned eg. from
> fwnode_pcs_get()).
> 
> Hence I would like to see an even more tight integration with phylink,
> in the sense that pointers to 'struct phylink_pcs' should never be
> exposed to the MAC driver, as only in that way we can be sure that
> phylink, and only phylink, is responsible for reacting to a PCS "going
> away".

OK, but then how does the MAC select the PCS? If there are multiple PCSs
then ultimately someone has to configure a mux somewhere.

> Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
> phylink_pcs to the Ethernet driver, so it can use it to populate struct
> phylink_config available_pcs member, this should be the responsibility
> of phylink alltogether, directly populating the list of available PCS in
> phylink's private structure.
> 
> Similarly, there should not be fwnode_pcs_get() but rather phylink
> providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
> directly adds the PCS referenced to the internal list of available PCS.

This is difficult to work with for existing drivers. Many of them have
non-standard ways of looking up their PCS that they need to support for
backwards-compatibility. And some of them create the PCS themselves
(such as if they are PCI devices with internal MDIO busses). It's much
easier for the MAC to create or look up the PCS itself and then hand it
off to phylink.

> I hope we can pick the best of all the suggested implementations, and
> together come up with something even better.

Sure. And I think we were starting from a clean slate then this would be
the obvious way to do things. But we must support existing drivers and
provide an upgrade path for them. This is why I favor an incremental
approach.

--Sean