[PATCH v2 net 2/5] net: dsa: be compatible with masters which unregister on shutdown
Sverdlin, Alexander
alexander.sverdlin at siemens.com
Wed Sep 4 01:31:13 PDT 2024
Hi Vladimir!
On Fri, 2021-09-17 at 16:34 +0300, Vladimir Oltean wrote:
> Lino reports that on his system with bcmgenet as DSA master and KSZ9897
> as a switch, rebooting or shutting down never works properly.
>
> What does the bcmgenet driver have special to trigger this, that other
> DSA masters do not? It has an implementation of ->shutdown which simply
> calls its ->remove implementation. Otherwise said, it unregisters its
> network interface on shutdown.
>
> This message can be seen in a loop, and it hangs the reboot process there:
>
> unregister_netdevice: waiting for eth0 to become free. Usage count = 3
>
> So why 3?
>
> A usage count of 1 is normal for a registered network interface, and any
> virtual interface which links itself as an upper of that will increment
> it via dev_hold. In the case of DSA, this is the call path:
>
> dsa_slave_create
> -> netdev_upper_dev_link
> -> __netdev_upper_dev_link
> -> __netdev_adjacent_dev_insert
> -> dev_hold
>
> So a DSA switch with 3 interfaces will result in a usage count elevated
> by two, and netdev_wait_allrefs will wait until they have gone away.
>
> Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
> delete themselves, but DSA cannot just vanish and go poof, at most it
> can unbind itself from the switch devices, but that must happen strictly
> earlier compared to when the DSA master unregisters its net_device, so
> reacting on the NETDEV_UNREGISTER event is way too late.
>
> It seems that it is a pretty established pattern to have a driver's
> ->shutdown hook redirect to its ->remove hook, so the same code is
> executed regardless of whether the driver is unbound from the device, or
> the system is just shutting down. As Florian puts it, it is quite a big
> hammer for bcmgenet to unregister its net_device during shutdown, but
> having a common code path with the driver unbind helps ensure it is well
> tested.
>
> So DSA, for better or for worse, has to live with that and engage in an
> arms race of implementing the ->shutdown hook too, from all individual
> drivers, and do something sane when paired with masters that unregister
> their net_device there. The only sane thing to do, of course, is to
> unlink from the master.
>
> However, complications arise really quickly.
>
> The pattern of redirecting ->shutdown to ->remove is not unique to
> bcmgenet or even to net_device drivers. In fact, SPI controllers do it
> too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
> and MDIO controllers do it too (this is something I have not researched
> too deeply, but even if this is not the case today, it is certainly
> plausible to happen in the future, and must be taken into consideration).
>
> Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
> insane implication is that for the exact same DSA switch device, we
> might have both ->shutdown and ->remove getting called.
>
> So we need to do something with that insane environment. The pattern
> I've come up with is "if this, then not that", so if either ->shutdown
> or ->remove gets called, we set the device's drvdata to NULL, and in the
> other hook, we check whether the drvdata is NULL and just do nothing.
> This is probably not necessary for platform devices, just for devices on
> buses, but I would really insist for consistency among drivers, because
> when code is copy-pasted, it is not always copy-pasted from the best
> sources.
>
> So depending on whether the DSA switch's ->remove or ->shutdown will get
> called first, we cannot really guarantee even for the same driver if
> rebooting will result in the same code path on all platforms. But
> nonetheless, we need to do something minimally reasonable on ->shutdown
> too to fix the bug. Of course, the ->remove will do more (a full
> teardown of the tree, with all data structures freed, and this is why
> the bug was not caught for so long). The new ->shutdown method is kept
> separate from dsa_unregister_switch not because we couldn't have
> unregistered the switch, but simply in the interest of doing something
> quick and to the point.
>
> The big question is: does the DSA switch's ->shutdown get called earlier
> than the DSA master's ->shutdown? If not, there is still a risk that we
> might still trigger the WARN_ON in unregister_netdevice that says we are
> attempting to unregister a net_device which has uppers. That's no good.
> Although the reference to the master net_device won't physically go away
> even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
> on it.
>
> The answer to that question lies in this comment above device_link_add:
>
> * A side effect of the link creation is re-ordering of dpm_list and the
> * devices_kset list by moving the consumer device and all devices depending
> * on it to the ends of these lists (that does not happen to devices that have
> * not been registered when this function is called).
>
> so the fact that DSA uses device_link_add towards its master is not
> exactly for nothing. device_shutdown() walks devices_kset from the back,
> so this is our guarantee that DSA's shutdown happens before the master's
> shutdown.
>
> Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
> Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
> Reported-by: Lino Sanfilippo <LinoSanfilippo at gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean at nxp.com>
> Tested-by: Andrew Lunn <andrew at lunn.ch>
> ---
> drivers/net/dsa/b53/b53_mdio.c | 21 ++++++++-
> drivers/net/dsa/b53/b53_mmap.c | 13 ++++++
> drivers/net/dsa/b53/b53_priv.h | 5 +++
> drivers/net/dsa/b53/b53_spi.c | 13 ++++++
> drivers/net/dsa/b53/b53_srab.c | 21 ++++++++-
> drivers/net/dsa/bcm_sf2.c | 12 ++++++
> drivers/net/dsa/dsa_loop.c | 22 +++++++++-
> drivers/net/dsa/lan9303-core.c | 6 +++
> drivers/net/dsa/lan9303.h | 1 +
> drivers/net/dsa/lan9303_i2c.c | 24 +++++++++--
> drivers/net/dsa/lan9303_mdio.c | 15 +++++++
> drivers/net/dsa/lantiq_gswip.c | 18 ++++++++
> drivers/net/dsa/microchip/ksz8795_spi.c | 11 ++++-
> drivers/net/dsa/microchip/ksz9477_i2c.c | 14 +++++-
> drivers/net/dsa/microchip/ksz9477_spi.c | 8 +++-
> drivers/net/dsa/mt7530.c | 18 ++++++++
> drivers/net/dsa/mv88e6060.c | 18 ++++++++
> drivers/net/dsa/mv88e6xxx/chip.c | 22 +++++++++-
> drivers/net/dsa/ocelot/felix_vsc9959.c | 20 ++++++++-
> drivers/net/dsa/ocelot/seville_vsc9953.c | 20 ++++++++-
> drivers/net/dsa/qca/ar9331.c | 18 ++++++++
> drivers/net/dsa/qca8k.c | 18 ++++++++
> drivers/net/dsa/realtek-smi-core.c | 20 ++++++++-
> drivers/net/dsa/sja1105/sja1105_main.c | 21 ++++++++-
> drivers/net/dsa/vitesse-vsc73xx-core.c | 6 +++
> drivers/net/dsa/vitesse-vsc73xx-platform.c | 22 +++++++++-
> drivers/net/dsa/vitesse-vsc73xx-spi.c | 22 +++++++++-
> drivers/net/dsa/vitesse-vsc73xx.h | 1 +
> include/net/dsa.h | 1 +
> net/dsa/dsa2.c | 50 ++++++++++++++++++++++
> 30 files changed, 457 insertions(+), 24 deletions(-)
[]
> diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
> index d7ce281570b5..89f920289ae2 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -1379,6 +1379,12 @@ int lan9303_remove(struct lan9303 *chip)
> }
> EXPORT_SYMBOL(lan9303_remove);
>
> +void lan9303_shutdown(struct lan9303 *chip)
> +{
> + dsa_switch_shutdown(chip->ds);
> +}
> +EXPORT_SYMBOL(lan9303_shutdown);
> +
> MODULE_AUTHOR("Juergen Borleis <kernel at pengutronix.de>");
> MODULE_DESCRIPTION("Core driver for SMSC/Microchip LAN9303 three port ethernet switch");
> MODULE_LICENSE("GPL v2");
> diff --git a/drivers/net/dsa/lan9303.h b/drivers/net/dsa/lan9303.h
> index 11f590b64701..c7f73efa50f0 100644
> --- a/drivers/net/dsa/lan9303.h
> +++ b/drivers/net/dsa/lan9303.h
> @@ -10,3 +10,4 @@ extern const struct lan9303_phy_ops lan9303_indirect_phy_ops;
>
> int lan9303_probe(struct lan9303 *chip, struct device_node *np);
> int lan9303_remove(struct lan9303 *chip);
> +void lan9303_shutdown(struct lan9303 *chip);
> diff --git a/drivers/net/dsa/lan9303_i2c.c b/drivers/net/dsa/lan9303_i2c.c
> index 9bffaef65a04..8ca4713310fa 100644
> --- a/drivers/net/dsa/lan9303_i2c.c
> +++ b/drivers/net/dsa/lan9303_i2c.c
> @@ -67,13 +67,28 @@ static int lan9303_i2c_probe(struct i2c_client *client,
>
> static int lan9303_i2c_remove(struct i2c_client *client)
> {
> - struct lan9303_i2c *sw_dev;
> + struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
>
> - sw_dev = i2c_get_clientdata(client);
> if (!sw_dev)
> - return -ENODEV;
> + return 0;
> +
> + lan9303_remove(&sw_dev->chip);
> +
> + i2c_set_clientdata(client, NULL);
> +
> + return 0;
> +}
> +
> +static void lan9303_i2c_shutdown(struct i2c_client *client)
> +{
> + struct lan9303_i2c *sw_dev = i2c_get_clientdata(client);
> +
> + if (!sw_dev)
> + return;
> +
> + lan9303_shutdown(&sw_dev->chip);
>
> - return lan9303_remove(&sw_dev->chip);
> + i2c_set_clientdata(client, NULL);
> }
>
> /*-------------------------------------------------------------------------*/
> @@ -97,6 +112,7 @@ static struct i2c_driver lan9303_i2c_driver = {
> },
> .probe = lan9303_i2c_probe,
> .remove = lan9303_i2c_remove,
> + .shutdown = lan9303_i2c_shutdown,
> .id_table = lan9303_i2c_id,
> };
> module_i2c_driver(lan9303_i2c_driver);
> diff --git a/drivers/net/dsa/lan9303_mdio.c b/drivers/net/dsa/lan9303_mdio.c
> index 9cbe80460b53..bbb7032409ba 100644
> --- a/drivers/net/dsa/lan9303_mdio.c
> +++ b/drivers/net/dsa/lan9303_mdio.c
> @@ -138,6 +138,20 @@ static void lan9303_mdio_remove(struct mdio_device *mdiodev)
> return;
>
> lan9303_remove(&sw_dev->chip);
> +
> + dev_set_drvdata(&mdiodev->dev, NULL);
> +}
> +
> +static void lan9303_mdio_shutdown(struct mdio_device *mdiodev)
> +{
> + struct lan9303_mdio *sw_dev = dev_get_drvdata(&mdiodev->dev);
> +
> + if (!sw_dev)
> + return;
> +
> + lan9303_shutdown(&sw_dev->chip);
> +
> + dev_set_drvdata(&mdiodev->dev, NULL);
> }
This unfortunately didn't work well with LAN9303 and probably will not work
with others:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+gitb7793b7d9b35 #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read+0x1c/0x34
lr : lan9303_phy_read+0x50/0x100
Call trace:
lan9303_mdio_phy_read+0x1c/0x34
lan9303_phy_read+0x50/0x100
dsa_slave_phy_read+0x40/0x50
__mdiobus_read+0x34/0x130
mdiobus_read+0x44/0x70
genphy_update_link+0x2c/0x104
genphy_read_status+0x2c/0x120
phy_check_link_status+0xb8/0xcc
phy_state_machine+0x198/0x27c
process_one_work+0x1dc/0x450
worker_thread+0x154/0x450
as long as the ports are not down (and dsa_switch_shutdown() doesn't ensure it),
we cannot just zero drvdata, because PHY polling will eventually call
static int lan9303_mdio_phy_read(struct lan9303 *chip, int addr, int reg)
{
struct lan9303_mdio *sw_dev = dev_get_drvdata(chip->dev);
return mdiobus_read_nested(sw_dev->device->bus, addr, reg);
There are however multiple other unsafe patterns.
I suppose current
dsa_switch_shutdown();
dev_set_drvdata(...->dev, NULL);
pattern is broken in many cases...
--
Alexander Sverdlin
Siemens AG
www.siemens.com
More information about the Linux-mediatek
mailing list