Deadlock debugging help.

Ben Greear greearb at candelatech.com
Tue Feb 27 13:42:31 PST 2018


On 02/27/2018 12:49 PM, Ben Greear wrote:
> I notice I can reliably lock up the kernel if I rmmod ath10k while it is under
> heavy tx/rx traffic.  First, this causes the firmware to crash, and then right
> after (or possibly during?) the related kernel threads deadlock.
>
> This is with my hacked driver and hacked firmware.  In particular, the
> ath10k_debug_nop_dwork is something I added, though it is pretty trivial,
> it does take the ar->conf_mutex.  It appears blocked trying to get it.
>
> It appears something is holding the ar->conf_mutex, but it is not clear to
> me from the lockdep output what process actually holds it.
> Anyone see a clue they could share?

Changing how I start/stop the nop_dwork stuff seems to have made the
problem go away, so I guess maybe that was the issue.

Thanks,
Ben

>
> Thanks,
> Ben
>
>
> [ 1618.914057] sysrq: SysRq : Show Regs
> [ 1618.916617] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W  O    4.13.16+ #2
> [ 1618.916618] Hardware name: _ _/ , BIOS 5.11 08/26/2016
> [ 1618.916620] task: ffff88016d3ea740 task.stack: ffffc900006c8000
> [ 1618.916624] RIP: 0010:cpuidle_enter_state+0x130/0x3b0
> [ 1618.916625] RSP: 0018:ffffc900006cbe78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff3e
> [ 1618.916628] RAX: ffff88016d3ea740 RBX: 0000000000000008 RCX: 000000000000001f
> [ 1618.916629] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88016d3ea740
> [ 1618.916630] RBP: ffffc900006cbeb0 R08: 00000000000052ca R09: 0000000000000000
> [ 1618.916631] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
> [ 1618.916633] R13: ffff880172d26c00 R14: ffffffff81ec66b8 R15: 00000178ee494b98
> [ 1618.916634] FS:  0000000000000000(0000) GS:ffff880172d00000(0000) knlGS:0000000000000000
> [ 1618.916636] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1618.916637] CR2: 00007ffdda2194b8 CR3: 0000000001e0f000 CR4: 00000000003406e0
> [ 1618.916638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1618.916639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1618.916640] Call Trace:
> [ 1618.916645]  cpuidle_enter+0x12/0x20
> [ 1618.916647]  call_cpuidle+0x1e/0x40
> [ 1618.916649]  do_idle+0x184/0x1e0
> [ 1618.916651]  cpu_startup_entry+0x5f/0x70
> [ 1618.916654]  start_secondary+0x150/0x180
> [ 1618.916657]  secondary_startup_64+0x9f/0x9f
> [ 1618.916662] Code: b7 df 95 ff 48 89 45 d0 0f 1f 44 00 00 31 ff e8 97 6c 97 ff 8b 45 c8 85 c0 0f 85 83 01 00 00 e8 57 5b 98 ff fb 66 0f 1f 44 00 00 <48> 8b 5d
> d0 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 fb 48 89 d8 48
>
> [ 1618.916724] CPU#2: ctrl:       000000070000000f
> [ 1618.916726] CPU#2: status:     0000000000000000
> [ 1618.916727] CPU#2: overflow:   0000000000000000
> [ 1618.916728] CPU#2: fixed:      00000000000000b0
> [ 1618.916729] CPU#2: pebs:       0000000000000000
> [ 1618.916730] CPU#2: debugctl:   0000000000000000
> [ 1618.916731] CPU#2: active:     0000000200000000
> [ 1618.916732] CPU#2:   gen-PMC0 ctrl:  0000000000000000
> [ 1618.916733] CPU#2:   gen-PMC0 count: 0000000000000000
> [ 1618.916734] CPU#2:   gen-PMC0 left:  0000000000000000
> [ 1618.916735] CPU#2:   gen-PMC1 ctrl:  0000000000000000
> [ 1618.916736] CPU#2:   gen-PMC1 count: 0000000000000000
> [ 1618.916737] CPU#2:   gen-PMC1 left:  0000000000000000
> [ 1618.916738] CPU#2:   gen-PMC2 ctrl:  0000000000000000
> [ 1618.916739] CPU#2:   gen-PMC2 count: 0000000000000000
> [ 1618.916740] CPU#2:   gen-PMC2 left:  0000000000000000
> [ 1618.916742] CPU#2:   gen-PMC3 ctrl:  0000000000000000
> [ 1618.916743] CPU#2:   gen-PMC3 count: 0000000000000000
> [ 1618.916744] CPU#2:   gen-PMC3 left:  0000000000000000
> [ 1618.916745] CPU#2: fixed-PMC0 count: 0000000000000000
> [ 1618.916746] CPU#2: fixed-PMC1 count: 0000fffed3945191
> [ 1618.916747] CPU#2: fixed-PMC2 count: 0000000000000000
> [ 1623.035449] INFO: task kworker/u8:1:273 blocked for more than 180 seconds.
> [ 1623.041155]       Tainted: G        W  O    4.13.16+ #2
> [ 1623.045173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1623.051966] kworker/u8:1    D    0   273      2 0x00000000
> [ 1623.052010] Workqueue: cfg80211 cfg80211_dfs_channels_update_work [cfg80211]
> [ 1623.052026] Call Trace:
> [ 1623.052032]  __schedule+0x407/0xb70
> [ 1623.052037]  schedule+0x38/0x90
> [ 1623.052039]  schedule_preempt_disabled+0x10/0x20
> [ 1623.052041]  __mutex_lock+0x3d3/0x930
> [ 1623.052044]  ? rtnl_lock+0x12/0x20
> [ 1623.052050]  mutex_lock_nested+0x16/0x20
> [ 1623.052052]  ? mutex_lock_nested+0x16/0x20
> [ 1623.052054]  rtnl_lock+0x12/0x20
> [ 1623.052066]  cfg80211_dfs_channels_update_work+0x3c/0x1c0 [cfg80211]
> [ 1623.052069]  ? process_one_work+0x14f/0x6a0
> [ 1623.052074]  process_one_work+0x1ce/0x6a0
> [ 1623.052078]  worker_thread+0x46/0x440
> [ 1623.052083]  kthread+0x10f/0x150
> [ 1623.052084]  ? process_one_work+0x6a0/0x6a0
> [ 1623.052086]  ? kthread_create_on_node+0x40/0x40
> [ 1623.052088]  ? kthread_create_on_node+0x40/0x40
> [ 1623.052091]  ret_from_fork+0x2a/0x40
> [ 1623.052098] INFO: task kworker/u8:2:281 blocked for more than 180 seconds.
> [ 1623.057700]       Tainted: G        W  O    4.13.16+ #2
> [ 1623.061659] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1623.068270] kworker/u8:2    D    0   281      2 0x00000000
> [ 1623.068282] Workqueue: ath10k_wq ath10k_debug_nop_dwork [ath10k_core]
> [ 1623.068284] Call Trace:
> [ 1623.068288]  __schedule+0x407/0xb70
> [ 1623.068292]  schedule+0x38/0x90
> [ 1623.068308]  schedule_preempt_disabled+0x10/0x20
> [ 1623.068310]  __mutex_lock+0x3d3/0x930
> [ 1623.068315]  ? ath10k_debug_nop_dwork+0x24/0x80 [ath10k_core]
> [ 1623.068322]  ? lock_acquire+0xac/0x200
> [ 1623.068325]  mutex_lock_nested+0x16/0x20
> [ 1623.068326]  ? mutex_lock_nested+0x16/0x20
> [ 1623.068331]  ath10k_debug_nop_dwork+0x24/0x80 [ath10k_core]
> [ 1623.068340]  process_one_work+0x1ce/0x6a0
> [ 1623.068345]  worker_thread+0x46/0x440
> [ 1623.068349]  kthread+0x10f/0x150
> [ 1623.068350]  ? process_one_work+0x6a0/0x6a0
> [ 1623.068352]  ? kthread_create_on_node+0x40/0x40
> [ 1623.068354]  ? kthread_create_on_node+0x40/0x40
> [ 1623.068356]  ret_from_fork+0x2a/0x40
> [ 1623.068363] INFO: task kworker/0:3:462 blocked for more than 180 seconds.
> [ 1623.073888]       Tainted: G        W  O    4.13.16+ #2
> [ 1623.077882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1623.084475] kworker/0:3     D    0   462      2 0x00000080
> [ 1623.084480] Workqueue: events linkwatch_event
> [ 1623.084482] Call Trace:
> [ 1623.084485]  __schedule+0x407/0xb70
> [ 1623.084489]  schedule+0x38/0x90
> [ 1623.084491]  schedule_preempt_disabled+0x10/0x20
> [ 1623.084493]  __mutex_lock+0x3d3/0x930
> [ 1623.084495]  ? rtnl_lock+0x12/0x20
> [ 1623.084501]  ? lock_acquire+0xac/0x200
> [ 1623.084504]  mutex_lock_nested+0x16/0x20
> [ 1623.084507]  ? mutex_lock_nested+0x16/0x20
> [ 1623.084509]  rtnl_lock+0x12/0x20
> [ 1623.084511]  linkwatch_event+0x9/0x30
> [ 1623.084513]  process_one_work+0x1ce/0x6a0
> [ 1623.084517]  worker_thread+0x46/0x440
> [ 1623.084521]  kthread+0x10f/0x150
> [ 1623.084523]  ? process_one_work+0x6a0/0x6a0
> [ 1623.084525]  ? kthread_create_on_node+0x40/0x40
> [ 1623.084528]  ? do_syscall_64+0x64/0x140
> [ 1623.084531]  ret_from_fork+0x2a/0x40
> [ 1623.084537] INFO: task kworker/1:2:562 blocked for more than 180 seconds.
> [ 1623.090051]       Tainted: G        W  O    4.13.16+ #2
> [ 1623.094000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1623.100574] kworker/1:2     D    0   562      2 0x00000080
> [ 1623.100615] Workqueue: events_freezable ieee80211_restart_work [mac80211]
> [ 1623.100617] Call Trace:
> [ 1623.100620]  __schedule+0x407/0xb70
> [ 1623.100625]  schedule+0x38/0x90
> [ 1623.100627]  schedule_preempt_disabled+0x10/0x20
> [ 1623.100629]  __mutex_lock+0x3d3/0x930
> [ 1623.100631]  ? rtnl_lock+0x12/0x20
> [ 1623.100648]  ? flush_work+0x298/0x2d0
> [ 1623.100654]  mutex_lock_nested+0x16/0x20
> [ 1623.100655]  ? mutex_lock_nested+0x16/0x20
> [ 1623.100658]  rtnl_lock+0x12/0x20
> [ 1623.100667]  ieee80211_restart_work+0x50/0xc0 [mac80211]
> [ 1623.100671]  process_one_work+0x1ce/0x6a0
> [ 1623.100675]  worker_thread+0x46/0x440
> [ 1623.100679]  kthread+0x10f/0x150
> [ 1623.100681]  ? process_one_work+0x6a0/0x6a0
> [ 1623.100683]  ? kthread_create_on_node+0x40/0x40
> [ 1623.100685]  ? do_syscall_64+0x64/0x140
> [ 1623.100688]  ret_from_fork+0x2a/0x40
> [ 1623.100714] INFO: task btserver:3871 blocked for more than 180 seconds.
> [ 1623.106094]       Tainted: G        W  O    4.13.16+ #2
> [ 1623.110042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1623.116600] btserver        D    0  3871   1958 0x00000080
> [ 1623.116604] Call Trace:
> [ 1623.116627]  __schedule+0x407/0xb70
> [ 1623.116633]  schedule+0x38/0x90
> [ 1623.116636]  schedule_preempt_disabled+0x10/0x20
> [ 1623.116638]  __mutex_lock+0x3d3/0x930
> [ 1623.116641]  ? rtnetlink_rcv+0x16/0x30
> [ 1623.116648]  mutex_lock_nested+0x16/0x20
> [ 1623.116649]  ? mutex_lock_nested+0x16/0x20
> [ 1623.116651]  rtnetlink_rcv+0x16/0x30
> [ 1623.116653]  netlink_unicast+0x1cb/0x2e0
> [ 1623.116656]  netlink_sendmsg+0x2c6/0x3a0
> [ 1623.116661]  sock_sendmsg+0x33/0x40
> [ 1623.116664]  SYSC_sendto+0x101/0x190
> [ 1623.116670]  ? __getnstimeofday64+0x9a/0x150
> [ 1623.116673]  ? getnstimeofday64+0x9/0x20
> [ 1623.116675]  ? __audit_syscall_entry+0xb2/0x100
> [ 1623.116678]  ? syscall_trace_enter+0x1bc/0x3b0
> [ 1623.116681]  ? __do_page_fault+0x28b/0x520
> [ 1623.116684]  ? trace_hardirqs_on_caller+0x11f/0x190
> [ 1623.116688]  SyS_sendto+0x9/0x10
> [ 1623.116690]  do_syscall_64+0x64/0x140
> [ 1623.116693]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 1623.116695] RIP: 0033:0x7f7951ec71b3
> [ 1623.116696] RSP: 002b:00007fffcf807788 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [ 1623.116699] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7951ec71b3
> [ 1623.116700] RDX: 0000000000000020 RSI: 00007fffcf8077c0 RDI: 000000000000000d
> [ 1623.116716] RBP: 00007fffcf807a20 R08: 00007fffcf8077b0 R09: 000000000000000c
> [ 1623.116717] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 1623.116718] R13: 000000000084b5a5 R14: 000000000084b5ad R15: 000000000084ba53
> [ 1623.116728]
>                Showing all locks held in the system:
> [ 1623.116733] 1 lock held by khungtaskd/38:
> [ 1623.116738]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8117ba8d>] debug_show_all_locks+0x3d/0x1a0
> [ 1623.116747] 3 locks held by kworker/u8:1/273:
> [ 1623.116749]  #0:  ("cfg80211"){.+.+.+}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116755]  #1:  ((&(&rdev->dfs_update_channels_wk)->work)){+.+...}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116761]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a2b72>] rtnl_lock+0x12/0x20
> [ 1623.116768] 3 locks held by kworker/u8:2/281:
> [ 1623.116769]  #0:  ("%s""ath10k_wq"){.+.+.+}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116775]  #1:  ((&(&ar->debug.nop_dwork)->work)){+.+...}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116781]  #2:  (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa066d554>] ath10k_debug_nop_dwork+0x24/0x80 [ath10k_core]
> [ 1623.116792] 3 locks held by kworker/0:3/462:
> [ 1623.116793]  #0:  ("events"){.+.+.+}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116799]  #1:  ((linkwatch_work).work){+.+.+.}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116805]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a2b72>] rtnl_lock+0x12/0x20
> [ 1623.116812] 3 locks held by kworker/1:2/562:
> [ 1623.116813]  #0:  ("events_freezable"){.+.+..}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116819]  #1:  ((&local->restart_work)){+.+...}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116825]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a2b72>] rtnl_lock+0x12/0x20
> [ 1623.116845] 2 locks held by bash/1896:
> [ 1623.116846]  #0:  (&tty->ldisc_sem){++++.+}, at: [<ffffffff819b67ad>] ldsem_down_read+0x2d/0x40
> [ 1623.116852]  #1:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff815f5ed9>] n_tty_read+0xa9/0x8d0
> [ 1623.116860] 1 lock held by btserver/3871:
> [ 1623.116861]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a40f6>] rtnetlink_rcv+0x16/0x30
> [ 1623.116867] 3 locks held by kworker/2:0/4308:
> [ 1623.116868]  #0:  ("%s"("ipv6_addrconf")){.+.+..}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116874]  #1:  ((addr_chk_work).work){+.+...}, at: [<ffffffff8113e14f>] process_one_work+0x14f/0x6a0
> [ 1623.116880]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a2b72>] rtnl_lock+0x12/0x20
> [ 1623.116887] 4 locks held by rmmod/7551:
> [ 1623.116888]  #0:  (&dev->mutex){......}, at: [<ffffffff81656b76>] device_release_driver_internal+0x26/0x210
> [ 1623.116894]  #1:  (&dev->mutex){......}, at: [<ffffffff81656b84>] device_release_driver_internal+0x34/0x210
> [ 1623.116899]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff818a2b72>] rtnl_lock+0x12/0x20
> [ 1623.116905]  #3:  (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0668eb4>] ath10k_stop+0x24/0x80 [ath10k_core]
> [ 1623.116915] 2 locks held by bash/7559:
> [ 1623.116916]  #0:  (&tty->ldisc_sem){++++.+}, at: [<ffffffff819b67ad>] ldsem_down_read+0x2d/0x40
> [ 1623.116922]  #1:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff815f5ed9>] n_tty_read+0xa9/0x8d0
>
> [ 1623.116929] =============================================
>
>


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list