BUG: Circular locking dependency on netdev led trigger on NanoPi R5S

Diederik de Haas didi.debian at cknow.org
Fri Jul 25 10:48:03 PDT 2025


Hi,

I have a FriendlyELEC NanoPi R5S (with rk3568 SoC) and in commit
1631cbdb8089 ("arm64: dts: rockchip: Improve LED config for NanoPi R5S")

I tried to improve its LED configuration and that included
``linux,default-trigger = "netdev"``

Problem: sometimes I got a 'hung task' error which resulted in the WAN
port not to come up (that's the only one I use) and logging in via
serial also didn't work, so pulling the plug was the only remedy.

Robin Murphy quickly identified that it likely had to do with led
triggers and removing those netdev triggers made the problem go away[1].
To find out what actually caused it, I built a kernel with PROOF_LOCKING
and PRINTK_CALLER enabled, which after adding a patch which fixed an
OOPS [2], showed the underlaying problem:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.16-rc7+unreleased-arm64-cknow #1 Not tainted
   ------------------------------------------------------
   modprobe/936 is trying to acquire lock:
   ffffc943e0edc3b0 (pernet_ops_rwsem){++++}-{4:4}, at: register_netdevice_notifier+0x38/0x148

   but task is already holding lock:
   ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_register+0x14c/0x1e0

   which lock already depends on the new lock.


   the existing dependency chain (in reverse order) is:

   -> #3 (&led_cdev->trigger_lock){+.+.}-{4:4}:
          lock_acquire+0x1cc/0x348
          down_write+0x40/0xd8
          led_trigger_set_default+0x5c/0x170
          led_classdev_register_ext+0x340/0x488
          __sdhci_add_host+0x190/0x368 [sdhci]
          dwcmshc_probe+0x2b8/0x6b0 [sdhci_of_dwcmshc]
          platform_probe+0x70/0xe8
          really_probe+0xc8/0x3a0
          __driver_probe_device+0x84/0x160
          driver_probe_device+0x44/0x128
          __device_attach_driver+0xc4/0x170
          bus_for_each_drv+0x90/0xf8
          __device_attach_async_helper+0xc0/0x120
          async_run_entry_fn+0x40/0x180
          process_one_work+0x23c/0x640
          worker_thread+0x1b4/0x360
          kthread+0x150/0x250
          ret_from_fork+0x10/0x20

   -> #2 (triggers_list_lock){++++}-{4:4}:
          lock_acquire+0x1cc/0x348
          down_write+0x40/0xd8
          led_trigger_register+0x58/0x1e0
          phy_led_triggers_register+0xf4/0x258 [libphy]
          phy_attach_direct+0x328/0x3a8 [libphy]
          phylink_fwnode_phy_connect+0xb0/0x138 [phylink]
          __stmmac_open+0xec/0x520 [stmmac]
          stmmac_open+0x4c/0xe8 [stmmac]
          __dev_open+0x13c/0x310
          __dev_change_flags+0x1d4/0x260
          netif_change_flags+0x2c/0x80
          dev_change_flags+0x90/0xd0
          devinet_ioctl+0x55c/0x730
          inet_ioctl+0x1e4/0x200
          sock_do_ioctl+0x6c/0x140
          sock_ioctl+0x328/0x3c0
          __arm64_sys_ioctl+0xb4/0x118
          invoke_syscall+0x6c/0x100
          el0_svc_common.constprop.0+0x48/0xf0
          do_el0_svc+0x24/0x38
          el0_svc+0x54/0x1e0
          el0t_64_sync_handler+0x10c/0x140
          el0t_64_sync+0x198/0x1a0

   -> #1 (rtnl_mutex){+.+.}-{4:4}:
          lock_acquire+0x1cc/0x348
          __mutex_lock+0xac/0x590
          mutex_lock_nested+0x2c/0x40
          rtnl_lock+0x24/0x38
          register_netdevice_notifier+0x40/0x148
          rtnetlink_init+0x40/0x68
          netlink_proto_init+0x120/0x158
          do_one_initcall+0x88/0x3b8
          kernel_init_freeable+0x2d0/0x340
          kernel_init+0x28/0x160
          ret_from_fork+0x10/0x20

   -> #0 (pernet_ops_rwsem){++++}-{4:4}:
          check_prev_add+0x114/0xcb8
          __lock_acquire+0x12e8/0x15f0
          lock_acquire+0x1cc/0x348
          down_write+0x40/0xd8
          register_netdevice_notifier+0x38/0x148
          netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
          led_trigger_set+0x1d4/0x328
          led_trigger_register+0x194/0x1e0
          netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
          do_one_initcall+0x88/0x3b8
          do_init_module+0x5c/0x270
          load_module+0x1ed8/0x2608
          init_module_from_file+0x94/0x100
          idempotent_init_module+0x1e8/0x2f0
          __arm64_sys_finit_module+0x70/0xe8
          invoke_syscall+0x6c/0x100
          el0_svc_common.constprop.0+0x48/0xf0
          do_el0_svc+0x24/0x38
          el0_svc+0x54/0x1e0
          el0t_64_sync_handler+0x10c/0x140
          el0t_64_sync+0x198/0x1a0

   other info that might help us debug this:

   Chain exists of:
     pernet_ops_rwsem --> triggers_list_lock --> &led_cdev->trigger_lock

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&led_cdev->trigger_lock);
                                  lock(triggers_list_lock);
                                  lock(&led_cdev->trigger_lock);
     lock(pernet_ops_rwsem);

    *** DEADLOCK ***

   2 locks held by modprobe/936:
    #0: ffffc943e0d2baa8 (leds_list_lock){++++}-{4:4}, at: led_trigger_register+0x10c/0x1e0
    #1: ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_register+0x14c/0x1e0

   stack backtrace:
   CPU: 0 UID: 0 PID: 936 Comm: modprobe Not tainted 6.16-rc7+unreleased-arm64-cknow #1 PREEMPTLAZY  Debian 6.16~rc7-2~exp1
   Hardware name: FriendlyElec NanoPi R5S (DT)
   Call trace:
    show_stack+0x34/0xa0 (C)
    dump_stack_lvl+0x70/0x98
    dump_stack+0x18/0x24
    print_circular_bug+0x230/0x280
    check_noncircular+0x174/0x188
    check_prev_add+0x114/0xcb8
    __lock_acquire+0x12e8/0x15f0
    lock_acquire+0x1cc/0x348
    down_write+0x40/0xd8
    register_netdevice_notifier+0x38/0x148
    netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
    led_trigger_set+0x1d4/0x328
    led_trigger_register+0x194/0x1e0
    netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
    do_one_initcall+0x88/0x3b8
    do_init_module+0x5c/0x270
    load_module+0x1ed8/0x2608
    init_module_from_file+0x94/0x100
    idempotent_init_module+0x1e8/0x2f0
    __arm64_sys_finit_module+0x70/0xe8
    invoke_syscall+0x6c/0x100
    el0_svc_common.constprop.0+0x48/0xf0
    do_el0_svc+0x24/0x38
    el0_svc+0x54/0x1e0
    el0t_64_sync_handler+0x10c/0x140
    el0t_64_sync+0x198/0x1a0
   leds-gpio gpio-leds: bus: 'platform': really_probe: bound device to driver leds-gpio

Full serial log can be found at [3] which is quite verbose and the boot
took way longer then normal as the following was added to cmdline:
``dyndbg="file dd.c func really_probe +p" maxcpus=1``

Free free to ask for additional info and/or to run tests.

Cheers,
  Diederik

[1] https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git/commit/?h=arm/fixes&id=912b1f2a796ec73530a709b11821cb0c249fb23e
[2] https://lore.kernel.org/linux-rockchip/f81b88df-9959-4968-a60a-b7efd3d5ea24@arm.com/
[3] https://paste.sr.ht/~diederik/142e92bfb29bbb58bca18a74cdffc5e0ba79081c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-rockchip/attachments/20250725/9da44d71/attachment-0001.sig>


More information about the Linux-rockchip mailing list