[PATCH net-next 3/6] net: dsa: add support for retrieving the interface mode

Vladimir Oltean olteanv at gmail.com
Wed Jul 20 15:44:47 PDT 2022


On Mon, Jul 18, 2022 at 09:48:51AM +0100, Russell King (Oracle) wrote:
> > But drivers could also have their CPU port working simply because those
> > are internal to an SoC and don't need any software configuration to pass
> > traffic. In their case, there is no breakage caused by the phylink_pcs
> > conversion, but breakage caused by sudden registration of phylink is
> > plausible, if phylink doesn't get the link parameters right.
> > 
> > And that breakage is preventable. Gradually more drivers could be
> > converted to create a fixed-link software node by printing a warning
> > that they should, and still keep the logic to avoid phylink registration
> > and putting the respective port down. Driver authors might not be very
> > responsive to RFC patch sets, but they do look at new warnings in dmesg
> > and try to see what they're about.
> 
> Are you going to do that conversion then? Good luck trying to find all
> the drivers, sending out series of patches, trying to get people to test
> the changes.

Not sure if that's a rhetorical question and if it is, what it's trying
to prove. Of course I'm not going to do any conversion, that's literally
my whole point, I'd rather err on the side of letting sleeping dogs cry
than force-converting everyone at once.

If there is a breakage report and the driver maintainer won't respond,
then yeah, maybe I'll consider looking at that particular issue and
converting if it helps, but that's kind of why I'm here.

Otherwise, we may end up pushing phylink to drivers which have some
wacky link speeds on the CPU port (like 2000, see b53_force_port_config),
which the software node auto-creation logic absolutely won't get right.
I know Florian said that "we won't see a regression since we do not use
the NATP accelerator which would be the reason to run the port at
2Gbits/sec", but frankly I'm not entirely sure what that even means or
what Florian counts as "regression". Lower overall termination throughput
counts or not? Still not worth risking if this isn't the only instance
of a non-standard speed.

> > What I'm missing is the proof that the phylink_pcs conversion has broken
> > those kinds of switches, and that it's therefore pointless to keep the
> > existing logic for them. Sorry, but you didn't provide it.
> 
> I don't have evidence that existing drivers have broken because I don't
> have the hardware to test. I only have Marvell DSA switch hardware and
> that is it.
> 
> Everything else has to be based on theory because no one bothers to
> respond to my patches, so for 99% of the DSA drivers I'm working in the
> dark - and that makes development an utter shitpile of poo in hell.
> 
> As I've said many times, we have NO CLUE which DSA drivers make use of
> this defaulting behaviour - the only one I'm aware of that does is
> mv88e6xxx. It all depends on the firmware description, driver behaviour
> and hardware behaviour. There is not enough information in the kernel 
> to be able to derive this.
> 
> If there was a reported regression, then I would be looking to get this
> into the NET tree not the NET-NEXT tree.

I think there are authors who weren't even aware they were opting into
this interesting DSA feature when they wrote their driver, but didn't
have the inspiration at the time to validate strict DT bindings either.

We have no proof that they don't have DT blobs using the defaulting
feature somewhere out there, but we don't have any proof that they do,
either. The whole problem could become more manageable if we would let
maintainers say to a user "hey, for my driver this feature was never
intended to work, so sorry if by accident it did, it was a marginal and
undocumented condition and now it's broken, so please use something that
was documented".

The ocelot driver is in this exact state, in fact. I really wish there
was a ready-made helper for validating phylink's OF node; I mentioned this
already, it needs to cater for all of fixed-link/phy-handle/managed/sfp.
The more drivers would be calling this (I think the vast majority of
post-phylink drivers would do it), the more we'd minimize the spectrum
of unforeseen breakage. In turn this would make the issue more tractable.



More information about the linux-arm-kernel mailing list