[PATCH net-next 3/6] net: dsa: add support for retrieving the interface mode
Vladimir Oltean
olteanv at gmail.com
Sat Jul 16 05:36:08 PDT 2022
On Sat, Jul 16, 2022 at 12:13:55PM +0100, Russell King (Oracle) wrote:
> > > and is exactly what I'm trying to get rid of, so we have _consistency_
> > > in the implementation, to prevent fuckups like I've created by
> > > converting many DSA drivers to use phylink_pcs. Any DSA driver that
> > > used a PCS for the DSA or CPu port and has been converted to
> > > phylink_pcs support has been broken in the last few kernel cycles. I'm
> > > trying to address that breakage before converting the Marvell DSA
> > > driver - which is the driver that highlighted the problem.
> >
> > You are essentially saying that it's of no use to keep in DSA the
> > fallback logic of not registering with phylink, because the phylink_pcs
> > conversions have broken the defaulting functionality already in all
> > other drivers.
>
> Correct, and I don't want these exceptions precisely because it creates
> stupid mistakes like the one I've highlighted. If we have one way of
> doing something, then development becomes much easier. When we have
> multiple different ways, mistakes happen, stuff breaks.
Agree that when we have multiple different ways, mistakes happen and
stuff breaks. This is also why I want to avoid the proliferation of the
default_interface reporting when most of the drivers were just fine
without it. It needs to be opt-in, and checking for the presence of a
dedicated function is an easy way to check for that opt-in.
> > I may have missed something, but this is new information to me.
> > Specifically, before you've said that it is *this* patch set which would
> > risk introducing breakage (by forcing a link down + a phylink creation).
> > https://lore.kernel.org/netdev/YsCqFM8qM1h1MKu%2F@shell.armlinux.org.uk/
> > What you're saying now directly contradicts that.
>
> No, it is not contradictory at all.
>
> There is previous breakage caused by converting DSA drivers to
> phylink_pcs - because the PCS code will *not* be called where a driver
> makes use of the "default-to-fastest-speed" mechanism, so is likely
> broken. Some drivers work around this by doing things manually.
Marvell conversion to phylink_pcs doesn't count as "previous" breakage
if it's not in net-next.
> This series *also* risks breakage, because it means phylink gets used
> in more situations which could confuse drivers - such as those which
> have manually brought stuff up and are not expecting phylink to also
> do it. Or those which do not fill in the mac_capabilities but do
> default to this "fastest-speed" mechanism. Or some other unforseen
> scenario.
I don't exactly understand the mechanics through which a phylink_pcs
conversion would break a driver that used defaulting behavior,
but I can only imagine it involves moving code that didn't depend on
phylink, to phylink callbacks. It's hard to say whether that happened
for any of the phylink_pcs conversions in net-next.
But drivers could also have their CPU port working simply because those
are internal to an SoC and don't need any software configuration to pass
traffic. In their case, there is no breakage caused by the phylink_pcs
conversion, but breakage caused by sudden registration of phylink is
plausible, if phylink doesn't get the link parameters right.
And that breakage is preventable. Gradually more drivers could be
converted to create a fixed-link software node by printing a warning
that they should, and still keep the logic to avoid phylink registration
and putting the respective port down. Driver authors might not be very
responsive to RFC patch sets, but they do look at new warnings in dmesg
and try to see what they're about.
What I'm missing is the proof that the phylink_pcs conversion has broken
those kinds of switches, and that it's therefore pointless to keep the
existing logic for them. Sorry, but you didn't provide it.
> So there's known breakage - which we know because we've tripped over
> the issue with mv88e6xxx pcs conversion which isn't in net-next yet,
> but illustrates the issue, and there's unknown breakage through
> applying this patch and not having had test feedback from all the
> DSA driver authors.
>
> There are no contradiction there what so ever.
>
> > Do you have concrete evidence that there is actually any regression of
> > this kind introduced by prior phylink_pcs conversions? Because if there
> > is, I retract the proposal to keep the fallback logic.
>
> Since we have no idea which DSA drivers make use of this "default-to-
> fastest-speed", and Andrew who has been promoting this doesn't seem to
> know which drivers make use of this, I do not, but we know that the
> breakage does occur if someone does make use of this with one of the
> converted DSA drivers through the behaviour of mv88e6xxx. Just because
> no one has reported it yet does not mean it doesn't exist. Not everyone
> updates their kernels each time Linus releases a final kernel version,
> especially in the embedded world. Sometimes it can take until a LTS
> release until people move forward, which we think will be 5.21.
More information about the Linux-mediatek
mailing list