[PATCH RFC net-next 5/5] net: dsa: always use phylink for CPU and DSA ports

Thu Jul 7 09:38:31 PDT 2022

On Thu, Jul 07, 2022 at 04:48:12PM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 07, 2022 at 06:27:27PM +0300, Vladimir Oltean wrote:
> > On Thu, Jul 07, 2022 at 11:09:43AM +0100, Russell King (Oracle) wrote:
> > > On Wed, Jul 06, 2022 at 05:24:09PM +0100, Russell King (Oracle) wrote:
> > > > On Wed, Jul 06, 2022 at 01:26:21PM +0300, Vladimir Oltean wrote:
> > > > > Can we please limit phylink_set_max_link_speed() to just the CPU ports
> > > > > where a fixed-link property is also missing, not just a phy-handle?
> > > > > Although to be entirely correct, we can also have MLO_AN_INBAND, which
> > > > > wouldn't be covered by these 2 checks and would still represent a valid
> > > > > DT binding.
> > > > 
> > > > phylink_set_max_fixed_link() already excludes itself:
> > > > 
> > > >         if (pl->cfg_link_an_mode != MLO_AN_PHY || pl->phydev || pl->sfp_bus)
> >                                                       ~~~~~~~~~~
> > 
> > If not NULL, this is an SFP PHY, right? In other words, it's supposed to protect from
> > phylink_sfp_connect_phy() - code involuntarily triggered by phylink_create() ->
> > phylink_register_sfp() - and not from calls to phylink_{,fwnode_}connect_phy()
> > that were initiated by the phylink user between phylink_create() and
> > phylink_set_max_fixed_link(), correct? Those are specified as invalid in the
> > kerneldoc and that's about it - that's not what the checking is for, correct?
> 
> No, it's not to do with sfps at all, but to do with enforcing the
> pre-conditions for the function - that entire line is checking that
> (a) we are in a sane state to be called, and (b) there is no
> configuration initialisation beyond the default done by
> phylink_create() - in other words, there is no in-band or fixed-link
> specified.
> 
> Let's go through this step by step.
> 
> 1. pl->cfg_link_an_mode != MLO_AN_PHY
>    The default value for cfg_link_an_mode is MLO_AN_PHY. If it's
>    anything other than that, then a fixed-link or in-band mode has
>    been specified, and we don't want to override that. So this call
>    needs to fail.

Thanks for the explanation.

Yes, I noticed that phylink_set_max_fixed_link() relies on the fact that
pl->cfg_link_an_mode has the unset value of 0, which coincidentally is
MLO_AN_PHY.

> 2. pl->phydev
>    If a PHY has been attached, then the pre-condition for calling this
>    function immediately after phylink_create() has been violated,
>    because the only way it can be non-NULL is if someone's called one of
>    the phylink functions that connects a PHY. Note: SFPs will not set
>    their PHY here, because, for them to discover that there's a PHY, the
>    network interface needs to be up, and it will never be up here... but
>    in any case...

Ok, so this does check for a precondition that the caller did something
correctly. But it doesn't (and can't) check that all preconditions and
postconditions are satisfied. That's one of my irks, why bother checking
the easy to satisfy precondition (which depends on the code organization,
static information, easy to check), and give up on the hard one (which
depends on the device tree blob, dynamic information, not so easy).

> > So this is what I don't understand. If we've called phylink_set_max_fixed_link()
> > we've changed pl->cfg_link_an_mode to MLO_AN_FIXED and this will
> > silently break future calls to phylink_{,fwnode_}connect_phy(), so DSA
> > predicts if it's going to call either of those connect_phy() functions,
> > and calls phylink_set_max_fixed_link() only if it won't. Right?
> > 
> > You've structured the checks in this "distributed" way because phylink
> > can't really predict whether phylink_{,fwnode_}connect_phy() will be
> > called after phylink_set_max_fixed_link(), right? I mean, it can
> > probably predict the fwnode_ variant, but not phylink_connect_phy, and
> > this is why it is up to the caller to decide when to call and when not to.
> 
> phylink has no idea whether phylink_fwnode_connect_phy() will be called
> with the same fwnode as phylink_create(), so it really can't make any
> assumptions about whether there will be a PHY or not.

This is interesting. Is there a use case for passing a different
fwnode_handle to the 2 functions?

> > It should maybe also
> > say that this function shouldn't be called if phylink_{,fwnode_}connect_phy()
> > is going to be called later.
> 
> It's already a precondition that phylink_{,fwnode_}connect_phy() fail if
> we're in fixed-link mode (because PHYs have never been supported when in
> fixed-link mode - if one remembers, the old fixed-link code used to
> provide its own emulation of a PHY to make fixed-links work.) So PHYs
> and fixed-links have always been mutually exclusive before phylink, and
> continue to be so with phylink.

Define "fail" exactly, because if I look in phylink_fwnode_phy_connect(), I see:

	/* Fixed links and 802.3z are handled without needing a PHY */
	if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
	    (pl->cfg_link_an_mode == MLO_AN_INBAND &&
	     phy_interface_mode_is_8023z(pl->link_interface)))
		return 0; <- does this count as failure?

This is why dsa_port_phylink_register() calls phylink_of_phy_connect()
without checking whether it has a fixed-link or a PHY, because it
doesn't fail even if it doesn't do anything.

In fact I've wanted to make a correction to my previous phrasing that
"this function shouldn't be called if phylink_{,fwnode_}connect_phy() is
going to be called later". The correction is "... with a phy-handle".

> > Can phylink absorb all this logic, and automatically call phylink_set_max_fixed_link()
> > based on the following?
> > 
> > (1) struct phylink_config gets extended with a bool fallback_max_fixed_link.
> > (2) DSA CPU and DSA ports set this to true in dsa_port_phylink_register().
> > (3) phylink_set_max_fixed_link() is hooked into this -ENODEV error
> >     condition from phylink_fwnode_phy_connect():
> > 
> > 	phy_fwnode = fwnode_get_phy_node(fwnode);
> > 	if (IS_ERR(phy_fwnode)) {
> > 		if (pl->cfg_link_an_mode == MLO_AN_PHY)
> > 			return -ENODEV; <- here
> > 		return 0;
> > 	}
> 
> My question in response would be - why should this DSA specific behaviour
> be handled completely internally within phylink, when it's a DSA
> specific behaviour? Why do we need boolean flags for this?

Because the end result will be simpler if we respect the separation of
concerns that continues to exist, and it's still phylink's business to
say what is and isn't valid. DSA still isn't aware of the bindings
required by phylink, it just passes its fwnode to it. Practically
speaking, I wouldn't be scratching my head as to why we're checking for
half the prerequisites of phylink_set_max_fixed_link() in one place and
for the other half in another.

True, through this patch set DSA is creating its own context specific
extension of phylink bindings, but arguably those existed before DSA was
even integrated with phylink, and we're just fixing something now we
didn't realize at the time we'd need to do.

I can reverse the question, why would phylink even want to be involved
in how the max fixed link parameters are deduced, and it doesn't just
require that a fixed-link software node is constructed somehow
(irrelevant to phylink how), and phylink is just modified to find and
work with that if it exists? Isn't it for the exact same reason,
separation of concerns, that it's easiest for phylink to figure out what
is the most appropriate maximum fixed-link configuration?