[PATCH RFC] Providing a helper for PCS inband negotiation

Vladimir Oltean olteanv at gmail.com
Tue May 16 02:00:09 PDT 2023


On Tue, May 16, 2023 at 12:36:33AM +0100, Russell King (Oracle) wrote:
> > Clause 73 negotiates the actual use of 10GBase-KR as a SERDES protocol
> > through the copper backplane in favor of other "Base-K*" alternative
> > link modes, so it's not quite proper to say that 10GBase-KR is a clause
> > 73 using protocol.
> 
> I believe it is correct to state it as such, because:
> 
> 72.1: Table 72–1—Physical Layer clauses associated with the 10GBASE-KR PMD
> 
> 	73—Auto-Negotiation for Backplane Ethernet              Required
> 
> Essentially, 802.3 doesn't permit 10GBASE-KR without hardware support
> for Clause 73 AN (but that AN doesn't have to be enabled by management
> software.)

Just like clause 40 (PCS and PMA for 1000BASE-T) requires clause 28 AN
to be supported. But, when the autoneg process begins, the use of
10GBase-KR as a protocol over the backplane link hasn't even been yet
established, so I find it unnatural to speak of clause 73 autoneg as
something that 10GBase-KR has.

The reason why I'm insisting on this is because to me, to treat clause
73 as an in-band autoneg process of 10GBase-KR sounds like a reversal of
causality. The clause 73 link codeword has a Technology Ability field
through which 10GBase-KR, 1GBase-KX etc are advertised as supported
protocols. If C73 is an inband protocol of 10GBase-KR, what should the
local PCS advertise for its Technology Ability? Only 10GBase-KR, because
this is what is implied by treating it as an attribute of 10GBase-KR, no?
But that would be a denatured way of negotiating - advertise a single
link mode, take it or leave it. And what other inband autoneg protocols
permit, say, starting from SGMII and ending in 1000Base-X? Clause 73
can't be directly compared to what we currently mean by managed =
"in-band-status".

Not only is C37 autoneg not directly comparable to C73, but they are not
mutually exclusive, either. I would say they are more or less orthogonal.
More below.

I don't believe that toggling clause 73 autoneg based on phylink_pcs_neg_mode()
makes much sense.

> > To me, the goals of clause 73 autoneg are much more similar to those of
> > the twisted pair autoneg process - clause 28, which similarly selects
> > between different media side protocols in the PHY, using a priority
> > resolution function. For those, we use phylib and the phy_device
> > structure. What are the merits of using phylink_pcs for copper backplanes
> > and not phylib?
> 
> I agree with you on that, because not only does that fit better with
> our ethernet PHY model, but it also means PHY_INTERFACE_MODE_XLGMII
> makes sense.
> 
> However, by that same token, 1000BASE-X should never have been a
> PHY_INTERFACE_MODE_xxx, and this should also have been treated purely
> as a PHY.
> 
> Taking that still further, this means SGMII, which is 1000BASE-X but
> modified for Cisco's purposes, would effectively be a media converting
> PHY sat between the MAC/RS and the "real" ethernet PHY. In this case,
> PHY_INTERFACE_MODE_SGMII might make sense because the "real" ethernet
> PHY needs to know that.
> 
> Then there's 1000BASE-X used to connect a "real" ethernet PHY to the
> MAC/RS, which means 1000BASE-X can't really be any different from
> SGMII.
> 
> This all makes the whole thing extremely muddy, but this deviates away
> from the original topic, because we're now into a "what should we call
> a PCS" vs "what should we call a PHY" discussion. Then we'll get into
> a discussion about phylib, difficulties with net_device only being
> able to have one phylib device, stacked PHYs, and phylib not being
> able to cope with non-MDIO based devices that we find on embedded
> platforms (some which don't even offer anything that approximates the
> 802.3 register set, so could never be a phylib driver.)
> 
> It even impacts on the DT description, since what does "managed =
> "in-band-status";" mean if we start considering 1000base-X the same
> way as 1000base-T and the "PHY" protocol for 1000base-X becomes GMII.
> A GMII link has no inband AN, so "managed = "in-band-status";" at
> that point makes no sense.
> 
> That is definitely a can of worms I do *not* want to open with this
> discussion - and much of the above has a long history and considerably
> pre-dates phylink.

I don't have much of a problem accepting that not everything we have
in-tree is consistent/correct. But if you think it's too big of a can of
worms to open, okay.

> 
> My original question was more around: how do we decide what we
> currently have as a PCS should use inband negotiation.
> 
> For SGMII and close relatives, and 1000BASE-X it's been obvious. For
> 2500BASE-X less so (due to vendors coming up with it before its been
> standardised.)
> 
> We have implementations using this for other protocols, so it's
> a question that needs answering for these other protocols.
> 
> 
> However, if we did want to extend this topic, then there are a number
> of questions that really need to be asked is about the XPCS driver.
> Such as - what does 1000BASE-KX, 10000BASE-KX4, 10000BASE-KR and
> 2500BASE-X have to do with USXGMII, and why are there no copper
> ethtool modes listed when a USXGMII link can definitely support
> being connected to a copper PHY? (See xpcs_usxgmii_features[]).
> 
> Why does XPCS think that USXGMII uses Clause 73 AN? (see the first
> entry in synopsys_xpcs_compat[].)

First, in principle USXGMII and clause 73 are not mutually exclusive.

It is possible to use clause 73 to advertise 10GBase-KR as a link mode,
and that will give you link training for proper 3-tap electrical
equalization over the copper backplane.

Then, once C73 AN/LT ended and 10GBase-KR has been established, is
possible to configure the 10GBase-R PCS to enable C37 USXGMII to select
the actual data rate via symbol replication, while the SERDES lane
remains at 10GBaud. At least, the XPCS seems to permit enabling symbol
replication in conjunction with 10GBase-KR.

Cross-checking what the XPCS driver does with the XPCS databook, I think
it's this:

It doesn't use C37 autoneg, but it still uses the USXGMII replicator.
It uses C73 (standard or perhaps a vendor-specific form) to advertise
the 1000baseKX_Full, 2500baseX_Full, 10000baseKX4_Full, 10000baseKR_Full
link modes. Each of these corresponds to a bit in the SR_AN_ADV3
register.

Note that 2500baseX_Full doesn't have a "K", and for good reason -
because 802.3 Table 73–4—"Technology Ability Field encoding" doesn't
specify 2500Base-KX as a copper backplane mode that exists. But XPCS
does! To quote the databook:

| When DATA[32] is set to 1, it indicates that the local device
| supports 2.5GBASE-KX speed mode.

And since it advertises this through a code word visible to the link
partner, I would suggest this means that its C73 hardware implementation
is non-standard.

Then, there's the entire issue of what the software does.

When 1GBase-KX is established through C73 autoneg, the XPCS driver
will leave the lane in 10GBase-R mode, and enable 1:10 symbol replication
in the USXGMII block. Sure, this is a 1Gbit data rate, but it uses the
64b/66b encoding (BASE-R), and not the advertised 8b/10b encoding
(BASE-X)! So it will likely only work between 2 XPCS devices!!!

Then, there's the entire issue that the code, as it was originally
introduced, is not the same as it is now. For example, this bit in
xpcs_do_config():

	switch (compat->an_mode) {
	case DW_AN_C73:
		if (phylink_autoneg_inband(mode)) {
			ret = xpcs_config_aneg_c73(xpcs, compat);
			if (ret)
				return ret;
		}
		break;

used to look at state->an_enabled rather than phylink_autoneg_inband().
Through my idiocy, I inadvertently converted that in commit 11059740e616
("net: pcs: xpcs: convert to phylink_pcs_ops").

> 
> xpcs_sgmii_features[] only mentions copper linkmodes, but as we know,
> it's common for copper PHYs to also support fibre with an auto-
> selection mechanism. So, 1000BASE-X is definitely possible if SGMII
> is supported, so why isn't it listed.

Most likely explanation is that XPCS has never been paired up until now
to such a PHY.

> As previously said, 1000BASE-X can be connected to a PHY that does
> 1000BASE-T, so why does xpcs_1000basex_features[] not mention
> 1000baseT_Full... there's probably more here as well.
> 
> Interestingly, xpcs_2500basex_features[] _does_ mention both
> 2500BASE-X and 2500BASE-T, but I think that only does because I
> happened to comment on it during a review.
> 
> I think xpcs is another can of worms, but is an easier can of worms
> to solve than trying to sort out that "what's an ethernet PHY"
> question we seem to be heading towards (which I think would be a
> mammoth task, even back when phylink didn't exist, to sort out.)

I wasn't necessarily going to go all the way into "what's a PHY?".
I just want to clarify some terms such that we can agree what is correct
and what is not. I believe that much of what's currently in XPCS w.r.t.
C73 is not correct, partly through initial intention and partly through
blind conversions such as mine.



More information about the Linux-mediatek mailing list