[PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown

Sverdlin, Alexander alexander.sverdlin at siemens.com
Wed Sep 4 01:31:13 PDT 2024


Hi Vladimir!

On Fri, 2021-09-17 at 16:34 +0300, Vladimir Oltean wrote:
> Lino reports that on his system with bcmgenet as DSA master and KSZ9897
> as a switch, rebooting or shutting down never works properly.
> 
> What does the bcmgenet driver have special to trigger this, that other
> DSA masters do not? It has an implementation of ->shutdown which simply
> calls its ->remove implementation. Otherwise said, it unregisters its
> network interface on shutdown.
> 
> This message can be seen in a loop, and it hangs the reboot process there:
> 
> unregister_netdevice: waiting for eth0 to become free. Usage count = 3
> 
> So why 3?
> 
> A usage count of 1 is normal for a registered network interface, and any
> virtual interface which links itself as an upper of that will increment
> it via dev_hold. In the case of DSA, this is the call path:
> 
> dsa_slave_create
> -> netdev_upper_dev_link
>    -> __netdev_upper_dev_link
>       -> __netdev_adjacent_dev_insert
>          -> dev_hold
> 
> So a DSA switch with 3 interfaces will result in a usage count elevated
> by two, and netdev_wait_allrefs will wait until they have gone away.
> 
> Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
> delete themselves, but DSA cannot just vanish and go poof, at most it
> can unbind itself from the switch devices, but that must happen strictly
> earlier compared to when the DSA master unregisters its net_device, so
> reacting on the NETDEV_UNREGISTER event is way too late.
> 
> It seems that it is a pretty established pattern to have a driver's
> ->shutdown hook redirect to its ->remove hook, so the same code is
> executed regardless of whether the driver is unbound from the device, or
> the system is just shutting down. As Florian puts it, it is quite a big
> hammer for bcmgenet to unregister its net_device during shutdown, but
> having a common code path with the driver unbind helps ensure it is well
> tested.
> 
> So DSA, for better or for worse, has to live with that and engage in an
> arms race of implementing the ->shutdown hook too, from all individual
> drivers, and do something sane when paired with masters that unregister
> their net_device there. The only sane thing to do, of course, is to
> unlink from the master.
> 
> However, complications arise really quickly.
> 
> The pattern of redirecting ->shutdown to ->remove is not unique to
> bcmgenet or even to net_device drivers. In fact, SPI controllers do it
> too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
> and MDIO controllers do it too (this is something I have not researched
> too deeply, but even if this is not the case today, it is certainly
> plausible to happen in the future, and must be taken into consideration).
> 
> Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
> insane implication is that for the exact same DSA switch device, we
> might have both ->shutdown and ->remove getting called.
> 
> So we need to do something with that insane environment. The pattern
> I've come up with is "if this, then not that", so if either ->shutdown
> or ->remove gets called, we set the device's drvdata to NULL, and in the
> other hook, we check whether the drvdata is NULL and just do nothing.
> This is probably not necessary for platform devices, just for devices on
> buses, but I would really insist for consistency among drivers, because
> when code is copy-pasted, it is not always copy-pasted from the best
> sources.
> 
> So depending on whether the DSA switch's ->remove or ->shutdown will get
> called first, we cannot really guarantee even for the same driver if
> rebooting will result in the same code path on all platforms. But
> nonetheless, we need to do something minimally reasonable on ->shutdown
> too to fix the bug. Of course, the ->remove will do more (a full
> teardown of the tree, with all data structures freed, and this is why
> the bug was not caught for so long). The new ->shutdown method is kept
> separate from dsa_unregister_switch not because we couldn't have
> unregistered the switch, but simply in the interest of doing something
> quick and to the point.
> 
> The big question is: does the DSA switch's ->shutdown get called earlier
> than the DSA master's ->shutdown? If not, there is still a risk that we
> might still trigger the WARN_ON in unregister_netdevice that says we are
> attempting to unregister a net_device which has uppers. That's no good.
> Although the reference to the master net_device won't physically go away
> even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
> on it.
> 
> The answer to that question lies in this comment above device_link_add:
> 
>  * A side effect of the link creation is re-ordering of dpm_list and the
>  * devices_kset list by moving the consumer device and all devices depending
>  * on it to the ends of these lists (that does not happen to devices that have
>  * not been registered when this function is called).
> 
> so the fact that DSA uses device_link_add towards its master is not
> exactly for nothing. device_shutdown() walks devices_kset from the back,
> so this is our guarantee that DSA's shutdown happens before the master's
> shutdown.
> 
> Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
> Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
> Reported-by: Lino Sanfilippo <LinoSanfilippo at gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean at nxp.com>
> Tested-by: Andrew Lunn <andrew at lunn.ch>
> ---
>  drivers/net/dsa/b53/b53_mdio.c             | 21 ++++++++-
>  drivers/net/dsa/b53/b53_mmap.c             | 13 ++++++
>  drivers/net/dsa/b53/b53_priv.h             |  5 +++
>  drivers/net/dsa/b53/b53_spi.c              | 13 ++++++
>  drivers/net/dsa/b53/b53_srab.c             | 21 ++++++++-
>  drivers/net/dsa/bcm_sf2.c                  | 12 ++++++
>  drivers/net/dsa/dsa_loop.c                 | 22 +++++++++-
>  drivers/net/dsa/lan9303-core.c             |  6 +++
>  drivers/net/dsa/lan9303.h                  |  1 +
>  drivers/net/dsa/lan9303_i2c.c              | 24 +++++++++--
>  drivers/net/dsa/lan9303_mdio.c             | 15 +++++++
>  drivers/net/dsa/lantiq_gswip.c             | 18 ++++++++
>  drivers/net/dsa/microchip/ksz8795_spi.c    | 11 ++++-
>  drivers/net/dsa/microchip/ksz9477_i2c.c    | 14 +++++-
>  drivers/net/dsa/microchip/ksz9477_spi.c    |  8 +++-
>  drivers/net/dsa/mt7530.c                   | 18 ++++++++
>  drivers/net/dsa/mv88e6060.c                | 18 ++++++++
>  drivers/net/dsa/mv88e6xxx/chip.c           | 22 +++++++++-
>  drivers/net/dsa/ocelot/felix_vsc9959.c     | 20 ++++++++-
>  drivers/net/dsa/ocelot/seville_vsc9953.c   | 20 ++++++++-
>  drivers/net/dsa/qca/ar9331.c               | 18 ++++++++
>  drivers/net/dsa/qca8k.c                    | 18 ++++++++
>  drivers/net/dsa/realtek-smi-core.c         | 20 ++++++++-
>  drivers/net/dsa/sja1105/sja1105_main.c     | 21 ++++++++-
>  drivers/net/dsa/vitesse-vsc73xx-core.c     |  6 +++
>  drivers/net/dsa/vitesse-vsc73xx-platform.c | 22 +++++++++-
>  drivers/net/dsa/vitesse-vsc73xx-spi.c      | 22 +++++++++-
>  drivers/net/dsa/vitesse-vsc73xx.h          |  1 +
>  include/net/dsa.h                          |  1 +
>  net/dsa/dsa2.c                             | 50 ++++++++++++++++++++++
>  30 files changed, 457 insertions(+), 24 deletions(-)

[]

> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index d7ce281570b5..89f920289ae2 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1379,6 +1379,12 @@ int lan9303_remove(struct lan9303 *chip)
>  }
>  EXPORT_SYMBOL(lan9303_remove);
>  
> +void lan9303_shutdown(struct lan9303 *chip)
> +{
> +	dsa_switch_shutdown(chip->ds);
> +}
> +EXPORT_SYMBOL(lan9303_shutdown);
> +
>  MODULE_AUTHOR("Juergen Borleis <kernel at pengutronix.de>");
>  MODULE_DESCRIPTION("Core driver for SMSC/Microchip LAN9303 three port ethernet switch");
>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
> index 11f590b64701..c7f73efa50f0 100644
> --- a/drivers/net/dsa/lan9303.h
> +++ b/drivers/net/dsa/lan9303.h
> @@ -10,3 +10,4 @@ extern const struct lan9303_phy_ops lan9303_indirect_phy_ops;
>  
>  int lan9303_probe(struct lan9303 *chip, struct device_node *np);
>  int lan9303_remove(struct lan9303 *chip);
> +void lan9303_shutdown(struct lan9303 *chip);
> diff --git a/drivers/net/dsa/lan9303_i2c.c b/drivers/net/dsa/lan9303_i2c.c
> index 9bffaef65a04..8ca4713310fa 100644
> --- a/drivers/net/dsa/lan9303_i2c.c
> +++ b/drivers/net/dsa/lan9303_i2c.c
> @@ -67,13 +67,28 @@ static int lan9303_i2c_probe(struct i2c_client *client,
>  
>  static int lan9303_i2c_remove(struct i2c_client *client)
>  {
> -	struct lan9303_i2c *sw_dev;
> +	struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
>  
> -	sw_dev = i2c_get_clientdata(client);
>  	if (!sw_dev)
> -		return -ENODEV;
> +		return 0;
> +
> +	lan9303_remove(&sw_dev->chip);
> +
> +	i2c_set_clientdata(client, NULL);
> +
> +	return 0;
> +}
> +
> +static void lan9303_i2c_shutdown(struct i2c_client *client)
> +{
> +	struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
> +
> +	if (!sw_dev)
> +		return;
> +
> +	lan9303_shutdown(&sw_dev->chip);
>  
> -	return lan9303_remove(&sw_dev->chip);
> +	i2c_set_clientdata(client, NULL);
>  }
>  
>  /*-------------------------------------------------------------------------*/
> @@ -97,6 +112,7 @@ static struct i2c_driver lan9303_i2c_driver = {
>  	},
>  	.probe = lan9303_i2c_probe,
>  	.remove = lan9303_i2c_remove,
> +	.shutdown = lan9303_i2c_shutdown,
>  	.id_table = lan9303_i2c_id,
>  };
>  module_i2c_driver(lan9303_i2c_driver);
> diff --git a/drivers/net/dsa/lan9303_mdio.c b/drivers/net/dsa/lan9303_mdio.c
> index 9cbe80460b53..bbb7032409ba 100644
> --- a/drivers/net/dsa/lan9303_mdio.c
> +++ b/drivers/net/dsa/lan9303_mdio.c
> @@ -138,6 +138,20 @@ static void lan9303_mdio_remove(struct mdio_device *mdiodev)
>  		return;
>  
>  	lan9303_remove(&sw_dev->chip);
> +
> +	dev_set_drvdata(&mdiodev->dev, NULL);
> +}
> +
> +static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> +{
> +	struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
> +
> +	if (!sw_dev)
> +		return;
> +
> +	lan9303_shutdown(&sw_dev->chip);
> +
> +	dev_set_drvdata(&mdiodev->dev, NULL);
>  }

This unfortunately didn't work well with LAN9303 and probably will not work
with others:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G           O       6.1.99+gitb7793b7d9b35 #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read+0x1c/0x34
lr : lan9303_phy_read+0x50/0x100
Call trace:
 lan9303_mdio_phy_read+0x1c/0x34
 lan9303_phy_read+0x50/0x100
 dsa_slave_phy_read+0x40/0x50
 __mdiobus_read+0x34/0x130
 mdiobus_read+0x44/0x70
 genphy_update_link+0x2c/0x104
 genphy_read_status+0x2c/0x120
 phy_check_link_status+0xb8/0xcc
 phy_state_machine+0x198/0x27c
 process_one_work+0x1dc/0x450
 worker_thread+0x154/0x450

as long as the ports are not down (and dsa_switch_shutdown() doesn't ensure it),
we cannot just zero drvdata, because PHY polling will eventually call

static int lan9303_mdio_phy_read(struct lan9303 *chip, int addr, int reg)
{
        struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);

        return mdiobus_read_nested(sw_dev->device->bus, addr, reg);

There are however multiple other unsafe patterns.
I suppose current

dsa_switch_shutdown();
dev_set_drvdata(...->dev, NULL);

pattern is broken in many cases...

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com


More information about the Linux-mediatek mailing list