[BUG] soc: fsl: qbman: lockdep invalid wait context with qman_update_cgr_smp_call

Steffen Trumtrar s.trumtrar at pengutronix.de
Thu Dec 28 02:19:06 PST 2023


Hi,

I noticed that lockdep reports a BUG on the qman driver since

    914f8b228ede709274b8c80514b352248ec9da00
    Author:     Sean Anderson <sean.anderson at seco.com>
    AuthorDate: Fri Sep 2 17:57:35 2022 -0400
    Commit:     David S. Miller <davem at davemloft.net>
    CommitDate: Mon Sep 5 14:27:39 2022 +0100

    soc: fsl: qbman: Add CGR update function

    This adds a function to update a CGR with new parameters. qman_create_cgr
    can almost be used for this (with flags=0), but it's not suitable because
    it also registers the callback function. The _safe variant was modeled off
    of qman_cgr_delete_safe. However, we handle multiple arguments and a return
    value.

The stack trace looks something like:

    [   20.192060] =============================
    [   20.196067] [ BUG: Invalid wait context ]
    [   20.200073] 6.7.0-rc6 #73 Not tainted
    [   20.203733] -----------------------------
    [   20.207738] systemd-journal/114 is trying to lock:
    [   20.212528] ffff000973403860 (&portal->cgr_lock){+.+.}-{3:3}, at: qman_update_cgr_smp_call+0x40/0xb0
    [   20.221688] other info that might help us debug this:
    [   20.226736] context-{2:2}
    [   20.229350] 1 lock held by systemd-journal/114:
    [   20.233878]  #0: ffff0008001a0208 (&root->kernfs_iattr_rwsem){++++}-{4:4}, at: kernfs_iop_permission+0x48/0xa0
    [   20.243902] stack backtrace:
    [   20.246779] CPU: 2 PID: 114 Comm: systemd-journal Not tainted 6.7.0-rc6 #73
    [   20.253743] Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (AT300) board (DT)
    [   20.261144] Call trace:
    [   20.261147]  dump_backtrace+0xa0/0x128
    [   20.261154]  show_stack+0x20/0x38
    [   20.261158]  dump_stack_lvl+0x74/0xd8
    [   20.274303]  dump_stack+0x18/0x28
    [   20.279004]  __lock_acquire+0x920/0x1b58
    [   20.284309]  lock_acquire+0x1fc/0x348
    [   20.289354]  _raw_spin_lock_irqsave+0x6c/0xd0
    [   20.294748]  qman_update_cgr_smp_call+0x40/0xb0
    [   20.299278]  __flush_smp_call_function_queue+0x1d0/0x3e0
    [   20.304593]  generic_smp_call_function_single_interrupt+0x1c/0x30
    [   20.310689]  ipi_handler+0x250/0x290
    [   20.314263]  handle_percpu_devid_irq+0xb0/0x170
    [   20.318793]  generic_handle_domain_irq+0x34/0x58
    [   20.323411]  gic_handle_irq+0x4c/0xd8
    [   20.327070]  call_on_irq_stack+0x24/0x58
    [   20.330991]  do_interrupt_handler+0xdc/0xe8
    [   20.335173]  el1_interrupt+0x34/0x68
    [   20.338747]  el1h_64_irq_handler+0x18/0x28
    [   20.342843]  el1h_64_irq+0x64/0x68
    [   20.346240]  lock_acquired+0x198/0x448
    [   20.349988]  down_read+0x98/0x1c0
    [   20.353300]  kernfs_iop_permission+0x48/0xa0
    [   20.357569]  inode_permission+0x118/0x190
    [   20.361578]  link_path_walk.part.0.constprop.0+0x2b0/0x398
    [   20.367065]  path_lookupat+0x44/0x1b8
    [   20.370726]  filename_lookup+0x9c/0x170
    [   20.374561]  user_path_at_empty+0x54/0x88
    [   20.378571]  do_faccessat+0x88/0x308
    [   20.382144]  __arm64_sys_access+0x2c/0x40
    [   20.386152]  invoke_syscall+0x50/0x120
    [   20.389901]  el0_svc_common.constprop.0+0xc8/0xf0
    [   20.394606]  do_el0_svc_compat+0x24/0x40
    [   20.398528]  el0_svc_compat+0x4c/0x148
    [   20.402275]  el0t_32_sync_handler+0xb0/0x138
    [   20.406545]  el0t_32_sync+0x194/0x198

The
    [   20.207738] systemd-journal/114 is trying to lock:
can be any other process and must not be systemd-journal. For example when barebox-state triggers the stacktrace the function calls look like:

#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         systemd-1       [002] ...2.     6.871198: qm_modify_cgr <-qman_init_cgr_all
         (...)
     kworker/2:1-38      [002] ...1.    19.070335: qman_update_cgr_safe <-dpaa_eth_cgr_set_speed
   barebox-state-211     [001] d.h1.    19.070344: qman_update_cgr_smp_call <-__flush_smp_call_function_queue
   barebox-state-211     [001] d.h3.    19.260311: qm_modify_cgr <-qman_update_cgr_smp_call
     kworker/2:1-38      [002] ...1.    19.305517: qman_update_cgr_safe <-dpaa_eth_cgr_set_speed
          <idle>-0       [001] d.h2.    19.305524: qman_update_cgr_smp_call <-__flush_smp_call_function_queue
          <idle>-0       [001] d.h4.    19.305526: qm_modify_cgr <-qman_update_cgr_smp_call
     kworker/3:1-40      [003] ...1.    19.354259: qman_update_cgr_safe <-dpaa_eth_cgr_set_speed
          <idle>-0       [001] d.h2.    19.354265: qman_update_cgr_smp_call <-__flush_smp_call_function_queue
          <idle>-0       [001] d.h4.    19.354267: qm_modify_cgr <-qman_update_cgr_smp_call

I'm not sure why the CPU# detection in the patch is necessary, but maybe you have an idea what is happening here.


Best regards,
Steffen

--
Pengutronix e.K.                | Dipl.-Inform. Steffen Trumtrar |
Steuerwalder Str. 21            | https://www.pengutronix.de/    |
31137 Hildesheim, Germany       | Phone: +49-5121-206917-0       |
Amtsgericht Hildesheim, HRA 2686| Fax:   +49-5121-206917-5555    |



More information about the linux-arm-kernel mailing list