Multi-PHYs and multiple-ports bonding support

Mon Oct 17 03:03:18 PDT 2022

Hi Maxime,

On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> Hello everyone,
> 
> I'm reaching out to discuss a PHY topic that we would like to see
> upstreamed, to support multiple ports attached to a MAC.
> 
> The end-goal is to achieve some redundancy in case of a physical link
> interruption, in a transparent manner, but using only one network
> interface (1 MAC).
> 
> We've been made aware that some products in the wild propose this
> feature-set, using 2 PHYs connected to the same MAC, using some custom
> logic to switch back and forth between the 2 PHYs, and that's the main
> use-case we'd like to see supported :
> 
>                             +-------+
>                      /----- |  PHY  | --- BaseT port
>  +-------+           |      +-------+
>  |  MAC  |-- RGMII --|
>  +-------+           |      +-------+
>                      \----- |  PHY  | --- BaseT port
>                             +-------+
> 

I can add more cases:

- case 1:
	Similar HW can be found in combination with AX88772B:
	https://cms.nacsemi.com/content/AuthDatasheets/ASIXS00048-1.pdf
	Page 6

	Current ASIX driver only takes care to power down internal PHY if
	external is present:
	https://elixir.bootlin.com/linux/latest/source/drivers/net/usb/asix_devices.c#L659
	But I can image some one wants to implement hot switching between
	internal PHY and external PHY or direct RMII connection too.

- case 2:
	A $CUSTOMER of us has a system where the RGMII from the MAC is routed
	via a analog multiplexer to a PHY or to an optional external
	board where the RGMII connects to the host port of a switch chip
	supported by DSA.

> This configuration comes with quite a lot of challenges since we bend
> the existing standards in numerous ways :
> 
>  - We have 2 PHYs on the same xMII bus, and they can't be active on that
>    bus at the same time. To solve that, we have 2 strategies: 
>  
>    - Put the PHY in isolate mode when not in use, they can perform link
>      detection and reporting, but wont communicate on the MII bus.
>      This can have side effects if both links are connected to the same
>      network, which can be addressed through the use of gratuitous ARPs
>      to make sure the right link gets known by the spanning-tree.

Can we "announce" topology change/reseting by switch the link state?
Usually, switches should drop forwarding entry for a port with down
state. But the problem with get complicated if there are multiple
bridges... :/

>    - Put PHY down entirely when not is use, select an active PHY, and
>    when the link goes down on that PHY, switch to the other. This was
>    used on products that had PHYs were the isolate mode is broken.

This is probably better way to go. I assume the use cases where this
kind of redundancy is used, it is preferable to to reduce weight, cost
and power consumption.

> Upstream, we have one device that does something a bit similar, which is
> the macchiatobin, using the 88x3310 PHY. This PHY exports both an SFP
> interface as long as a copper BaseT interface. These 2 interfaces are
> connected to the same MAC and are mutually exclusive.
> 
> It looks like this :
> 
>  +-------+              +---------+   |---- Copper BaseT
>  |  MAC  | -- xxxMII -- |   PHY   |---|
>  +-------+              +---------+   |---- SFP
>  
> We don't have any way to control which port gets used, the first that
> has the link gets the link.
> 
> Ideally we would like to be able to configure every aspects of these
> 2 cases, like :
>  - Which link do we use
>  - Do we switch automatically from one to the other
>  - What are the links available
> 
> I see 4 different aspects of this that would need to be added for this
> whole mechanism to work :
> 
> 1) DT representation
> 
> To support that, we would need a way to give knowledge to the kernel
> about the numer of physical ports that are connected to a given MAC.
> In the dual-phy mode, it's pretty straightforward, since we would
> "just" need to pass multiple phy handles to the mac node. In the MCBin
> case, it's a bit more complex, since we don't have a clear view on the
> number of ports connected to a given phy.
> 
> The assumption is that we have only one port per phy, and it's nature is
> derived from the presence of an sfp=<> phandle in the DT, plus the
> driver itself specifying the phydev->port field (which to my knowledge
> isn't used that much ?)
> 
> The subject of describing the ports a PHY exposes in a sensible way that
> doesn't require changing all DTs out-there has been discussed in the
> past here :
> https://lore.kernel.org/netdev/20201119152246.085514e1@bootlin.com/
> 
> If we only focus on the dual-phy use-case - and not the single-phy
> dual-port - we might not have to deal with extensive DT changes at all.
> 
> 2) Changes in Phylink
> 
> This might be the tricky part, as we need to track several ports,
> possibly connected to different PHYs, to get their state. For now, I
> haven't prototyped any of this yet.
> 
> The goal would be to allow either automatic switching, as is already
> done by the 3310 driver, but at a higher level. Phylink might not be the
> right place to do that, so maybe we just want to expose an API to get
> the possible ports on a given interface, their repective state, and a
> way to select one
> 
> My idea would be to introduce a notion of a struct phy_port, that would
> describe a physical port. They would be controlled by a PHY (or a MAC,
> if the mac outputs 1000BaseX for example), one phy can
> possibly control multiple ports.
> 
> The whole link redundancy would then be done manipulating ports, giving
> a layer of abstraction on the hardware topology itself.
> 
> We would therefore abstract the logic by having :
>                          +--------+
>                      /---|  Port  |
>  +-------------+     |   +--------+
>  |  netdevice  | ----|
>  +-------------+     |
>                      |   +---------+
>                      \---|  Port   |
>                          +---------+
> 
> This is the representation the userspace would know about, without
> necessarily having to worry about the phys inbetween.
> 
> I don't see that as a breaking change, since as of today, most systems
> only have one port per netdevice. We would need to add a way to deal
> with multiple ports per netdevice.
> 
> 3) Adding a L2 bonding driver
> 
> If the link switching logic is deported outside of phylink, we might
> want a generic way of bonding ports on an interface, configuring the
> policy to use for the switching (automatic, manual selection, maybe
> more like trying to elect the link with the highest speed ?). This is
> where we would handle sending the gratuitous ARPs upon link switching
> too.
> 
> 3) UAPI
> 
> From userspace, we would need ways to list the ports, their state, and
> possibly to configure the bonding parameters. for now in ethtool, we
> don't have the notion of port at all, we just have 1 netdevice == 1
> port. Should we therefore create one netdevice per port ? or stick to
> that one interface and refer to its ports with some ethtool parameters ?
> 
> All of these are open questions, as this topic spans quite a lot of
> aspects in the stack. Any input, idea, comment, are very very welcome.

What about this use case:

MAC with > 1 PHYs. One PHY is active, you want to do cable testing
and/or to check the signal quality with SQI. Both are triggered
currently via ethtool on an interface.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |