[net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling

Sean Anderson sean.anderson at linux.dev
Mon May 19 16:28:34 PDT 2025


On 5/19/25 14:10, Sean Anderson wrote:
> On 5/13/25 23:00, Daniel Golle wrote:
>> On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
>>> On 5/13/25 15:03, Daniel Golle wrote:
>>> > just instead of having many
>>> > more or less identical implementations of .mac_select_pcs, this
>>> > functionality is moved into phylink. As a nice side-effect that also
>>> > makes managing the life-cycle of the PCS more easy, so we won't need all
>>> > the wrappers for all the PCS OPs.
>>> 
>>> I think the wrapper approach is very obviously correct. This way has me
>>> worried about exciting new concurrency bugs.
>> 
>> You may not be surprised to read that this was also our starting point 2
>> months ago, I had implemented support for standalone PCS very similar to
>> the approach you have published now, using refcnt'ed instances and
>> locked wrapper functions for all OPs. My approach, like yours, was to
>> create a new subsystem for standalone PCS drivers which is orthogonal to
>> phylink and only requires very few very small changes to phylink itself.
>> It was a draft and not as complete and well-documented like your series
>> now, of course.
>> 
>> I've then shared that implementation with Christian and some other
>> experienced OpenWrt developers and we concluded that having phylink handle
>> the PCS lifecycle and PCS selection would be the better and more elegant
>> approach for multiple reasons:
>>  - The lifetime management of the wrapper instances becomes tricky:
>>    We would either have to live with them being allocated by the
>>    MAC-driver (imagine test-case doing unbind and then bind in a loop
>>    for a while -- we would end up oom). Or we need some kind of garbage
>>    collecting mechanism which frees the wrapper once refcnt is zero --
>>    and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
>>    'put' equivalent (eg. a .pcs_destroy() OP) in phylink.
>> 
>>    Russell repeatedly pointed me to the possibility of a PCS
>>    "disappearing" (and potentially "reappearing" some time later), and
>>    in this case it is unclear who would then ever call pcs_put(), or
>>    even notify the Ethernet driver or phylink about the PCS now being
>>    available (again). Using device_link_add(), like it is done in
>>    pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
>>    impacts all other netdevs exposed by the same Ethernet driver
>>    instance, and has a few other rather ugly implications.
> 
> SRCU neatly solves the lifetime management issues. The wrapper lives as
> long as anyone (provider or user) holds a reference. A PCS can disappear
> at any point and everything still works (although the link goes down).
> Device links are only an optimization; they cannot be relied on for
> correctness.
> 
>>  - phylink currently expects .mac_select_pcs to never fail. But we may
>>    need a mechanism similar to probe deferral in case the PCS is not
>>    yet available.
> 
> Which is why you grab the PCS in probe. If you want to be more dynamic,
> you can do it in netdev open like is done for PHYs.
> 
>>    Your series partially solves this in patch 11/11 "of: property: Add
>>    device link support for PCS", but also that still won't make the link
>>    come back in case of a PCS showing up late to the party, eg. due to
>>    constraints such as phy drivers (drivers/phy, not drivers/net/phy)
>>    waiting for nvmem providers, or PCS instances "going away" and
>>    "coming back" later.
> 
> This all works correctly due to device links. The only case that doesn't
> work automatically is something like
> 
> MAC built-in
>   MDIO built-in
>     PCS module
> 
> where the PCS module gets loaded late. In that case you have to manually
> re-probe the MAC. I think the best way to address this would be to grab
> the PCS in netdev open so that the MAC can probe without the PCS.
> 
>>  - removal of a PCS instance (eg. via sysfs unbind) would still
>>    require changes to phylink. there is no phylink function to
>>    impair the link in this case, and using dev_close() is a bit ugly,
>>    and also won't bring the link back up once the PCS (re-)appears.
> 
> This works just fine. There are two cases:
> 
> - If the PCS has an IRQ, we notify phylink and then it polls the PCS
>   (see below).
> - If the PCS is polled, phylink will call pcs_get_state and see that the
>   link is down.
> 
> Either way, the link goes down. But bringing the link back up is pretty
> unusual anyway. Unlike PHYs (which theoretically can be on removable
> busses) PCSs are generally permanently attached to their MACs. The only
> removable scenario I can think of is if the PCS is on an FPGA and the
> MAC is not.
> 
> So if the PCS goes away, the MAC is likely to follow shortly after
> (since the whole thing is on a removable bus). Or someone has manually
> removed the PCS, in which case I think it's reasonable to have them
> manually remove the MAC as well. If you really want to support this,
> then just grab the PCS in netdev open.

So I had a closer look at this and unfortunately it isn't as easy as
just grabbing the PCS in ndo_open. The problem is that we need to
know the supported interfaces before phylink_create. The interfaces are
validated and are visible to userspace as soon as the netdev is
registered. And we can't just defer phylink_create to ndo_open because a
lot of the ethtool ops are implemented with phylink. So this would
probably need something like phylink_update_supported_interfaces().

But TBH I don't think this use case is very relevant. As I said above,
it only affects FPGA reconfiguration and people manually unbinding
drivers. Either way I think they are savvy enough to reprobe the netdev.

--Sean

>>  - phylink anyway is the only user of PCS drivers, and will very likely
>>    always be. So why create another subsystem?
> 
> To avoid adding overhead for the majority of PCSs where the PCS is built
> into the MAC and literally can't be removed. We only pay the price for
> dynamicism on the drivers where it matters.
> 
>> All that being said I also see potential problems with Christians
>> current implementation as it doesn't prevent the Ethernet driver to
>> still store a pointer to struct phylink_pcs (returned eg. from
>> fwnode_pcs_get()).
>> 
>> Hence I would like to see an even more tight integration with phylink,
>> in the sense that pointers to 'struct phylink_pcs' should never be
>> exposed to the MAC driver, as only in that way we can be sure that
>> phylink, and only phylink, is responsible for reacting to a PCS "going
>> away".
> 
> OK, but then how does the MAC select the PCS? If there are multiple PCSs
> then ultimately someone has to configure a mux somewhere.
> 
>> Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
>> phylink_pcs to the Ethernet driver, so it can use it to populate struct
>> phylink_config available_pcs member, this should be the responsibility
>> of phylink alltogether, directly populating the list of available PCS in
>> phylink's private structure.
>> 
>> Similarly, there should not be fwnode_pcs_get() but rather phylink
>> providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
>> directly adds the PCS referenced to the internal list of available PCS.
> 
> This is difficult to work with for existing drivers. Many of them have
> non-standard ways of looking up their PCS that they need to support for
> backwards-compatibility. And some of them create the PCS themselves
> (such as if they are PCI devices with internal MDIO busses). It's much
> easier for the MAC to create or look up the PCS itself and then hand it
> off to phylink.
> 
>> I hope we can pick the best of all the suggested implementations, and
>> together come up with something even better.
> 
> Sure. And I think we were starting from a clean slate then this would be
> the obvious way to do things. But we must support existing drivers and
> provide an upgrade path for them. This is why I favor an incremental
> approach.
> 
> --Sean



More information about the Linux-mediatek mailing list