am335x: 5.18.x: system stalling

Yegor Yefremov yegorslists at googlemail.com
Tue May 31 01:36:20 PDT 2022


On Mon, May 30, 2022 at 5:15 PM Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Mon, 30 May 2022 at 15:54, Arnd Bergmann <arnd at arndb.de> wrote:
> >
> > On Sat, May 28, 2022 at 9:28 PM Yegor Yefremov
> > <yegorslists at googlemail.com> wrote:
> > >
> > > On Sat, May 28, 2022 at 3:14 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > > >
> > > > On Sat, May 28, 2022 at 3:01 PM Yegor Yefremov
> > > > <yegorslists at googlemail.com> wrote:
> > > > > On Sat, May 28, 2022 at 11:07 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > > > > In file included from ./include/linux/irqflags.h:17,
> > > > >                  from ./arch/arm/include/asm/bitops.h:28,
> > > > >                  from ./include/linux/bitops.h:33,
> > > > >                  from ./include/linux/log2.h:12,
> > > > >                  from kernel/bounds.c:13:
> > > > > ./arch/arm/include/asm/percpu.h: In function ‘__my_cpu_offset’:
> > > > > ./arch/arm/include/asm/percpu.h:32:9: error: ‘__per_cpu_offset’
> > > > > undeclared (first use in this function); did you mean
> > > > > ‘__my_cpu_offset’?
> > > > >    32 |  return __per_cpu_offset[0];
> > > > >       |         ^~~~~~~~~~~~~~~~
> > > > >       |         __my_cpu_offset
> > > > > ./arch/arm/include/asm/percpu.h:32:9: note: each undeclared identifier
> > > > > is reported only once for each function it appears in
> > > >
> > > > I think you just missed the line in my patch that adds the
> > > > "extern unsigned long __per_cpu_offset[];" variable declaration.
> > >
> > > So, I tried both variants and both led to stalls.
> >
> > I'm running out of ideas here.  Going to back to the original bisection,
> > I rebased Ard's patches in a way that you should be able to build the
> > config for each patch, and I split up the "ARM: implement
> > THREAD_INFO_IN_TASK for uniprocessor systems" commit in yet
> > another way, hoping to get something left over that points to the
> > bug. Can you try bisecting through the top commits of
> >
> > https://kernel.org/pub/scm/linux/kernel/git/soc/soc.git am335x-stall-test
> >
> > starting maybe with "52d240871760 irqchip: nvic: Use
> > GENERIC_IRQ_MULTI_HANDLER" as the patch that is almost certainly
> > going to be ok?
> >
> > At some point I fear we may have to give up and just mark the v6+SMP
> > configuration as broken, which is something we have considered in the
> > past but ended up always keeping around for the purpose of testing
> > omap2plus_defconfig and imx_v6_v7_defconfig. Note that on production
> > systems you probably don't want to use that config anway, and should
> > either stick to a uniprocessor build, or disable the ARMv6 support.
> >
>
> Yeah, I am also running out of ideas. One question, though: does the
> RCU detected stall always occur in the same place? I.e., how similar
> are the backtraces of the stalls between different occurrences?
> Perhaps we could narrow down where in the code we are stalling, and
> gain some more understanding of the root cause.

I have attached 4 crash logs and will start with Arnd's branch bisecting.

Yegor
-------------- next part --------------
[  219.721096] rcu: INFO: rcu_sched self-detected stall on CPU
[  219.727845] rcu:     0-...!: (2600 ticks this GP) idle=e7d/1/0x40000004 softirq=3592/3592 fqs=0
[  219.737376]  (t=2600 jiffies g=5525 q=21)
[  219.742051] rcu: rcu_sched kthread timer wakeup didn't happen for 2599 jiffies! g5525 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[  219.753979] rcu:     Possible timer handling issue on cpu=0 timer-softirq=2867
[  219.761534] rcu: rcu_sched kthread starved for 2600 jiffies! g5525 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  219.772512] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  219.782043] rcu: RCU grace-period kthread stack dump:
[  219.787605] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000
[  219.797138]  __schedule from schedule+0x58/0xcc
[  219.802763]  schedule from schedule_timeout+0x78/0xf8
[  219.808847]  schedule_timeout from rcu_gp_fqs_loop+0x108/0x3d0
[  219.815741]  rcu_gp_fqs_loop from rcu_gp_kthread+0xa8/0x134
[  219.822273]  rcu_gp_kthread from kthread+0xe4/0x104
[  219.828121]  kthread from ret_from_fork+0x14/0x28
[  219.833664] Exception stack(0xd0041fb0 to 0xd0041ff8)
[  219.839459] 1fa0:                                     00000000 00000000 00000000 00000000
[  219.848426] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  219.857325] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  219.864572] rcu: Stack dump where RCU GP kthread last ran:
[  219.870636] NMI backtrace for cpu 0
[  219.874702] CPU: 0 PID: 58 Comm: kworker/0:8 Tainted: G        W         5.18.0-rc7 #14
[  219.883491] Hardware name: Generic AM33XX (Flattened Device Tree)
[  219.890214] Workqueue: events dbs_work_handler
[  219.895659]  unwind_backtrace from show_stack+0x10/0x14
[  219.901897]  show_stack from dump_stack_lvl+0x58/0x70
[  219.908005]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[  219.914814]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[  219.922875]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[  219.931828]  trigger_single_cpu_backtrace from rcu_check_gp_kthread_starvation+0xf4/0x148
[  219.941030]  rcu_check_gp_kthread_starvation from rcu_sched_clock_irq+0xa98/0xf8c
[  219.949573]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[  219.957001]  update_process_times from tick_sched_handle+0x48/0x54
[  219.964167]  tick_sched_handle from tick_sched_timer+0x48/0xac
[  219.970891]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[  219.978144]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[  219.985517]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[  219.993529]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[  220.002289]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[  220.009770]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[  220.016630]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[  220.023279]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[  220.030501]  generic_handle_arch_irq from __irq_svc+0x90/0xbc
[  220.037105] Exception stack(0xd0001f58 to 0xd0001fa0)
[  220.042841] 1f40:                                                       c01015c8 00000000
[  220.051805] 1f60: 0eaec000 00000000 fffffe00 600f0013 ffffffff d0385d5c 00000000 c3744a80
[  220.060765] 1f80: 00000200 c3744a80 c208dcd8 d0001fa8 c01015c8 c01015d0 600f0113 ffffffff
[  220.069580]  __irq_svc from __do_softirq+0xa0/0x5fc
[  220.075370]  __do_softirq from __irq_exit_rcu+0x138/0x178
[  220.081788]  __irq_exit_rcu from irq_exit+0x8/0x28
[  220.087557]  irq_exit from call_with_stack+0x18/0x20
[  220.093503]  call_with_stack from __irq_svc+0x9c/0xbc
[  220.099402] Exception stack(0xd0385d28 to 0xd0385d70)
[  220.105218] 5d20:                   c208dd04 f9e00488 c2006940 c191a2fc c208dcc0 c208a680
[  220.114175] 5d40: c208dcc0 c191a2fc 00000000 c208dcc0 00000005 c208dcd8 fffffff9 d0385d78
[  220.123033] 5d60: c06d5e5c c06d5c60 600f0013 ffffffff
[  220.128675]  __irq_svc from _omap3_noncore_dpll_lock+0x14/0xc4
[  220.135601]  _omap3_noncore_dpll_lock from omap3_noncore_dpll_program+0x14c/0x5e4
[  220.144176]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[  220.151871]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[  220.159248]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[  220.166215]  clk_set_rate from _set_opp+0x214/0x528
[  220.171991]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[  220.178264]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[  220.186075]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[  220.193319]  od_dbs_update from dbs_work_handler+0x2c/0x60
[  220.199733]  dbs_work_handler from process_one_work+0x284/0x72c
[  220.206617]  process_one_work from worker_thread+0x28/0x4b0
[  220.213147]  worker_thread from kthread+0xe4/0x104
[  220.218844]  kthread from ret_from_fork+0x14/0x28
[  220.224350] Exception stack(0xd0385fb0 to 0xd0385ff8)
[  220.230085] 5fa0:                                     00000000 00000000 00000000 00000000
[  220.239020] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  220.247910] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  220.255832] NMI backtrace for cpu 0
[  220.260006] CPU: 0 PID: 58 Comm: kworker/0:8 Tainted: G        W         5.18.0-rc7 #14
[  220.268798] Hardware name: Generic AM33XX (Flattened Device Tree)
[  220.275513] Workqueue: events dbs_work_handler
[  220.280953]  unwind_backtrace from show_stack+0x10/0x14
[  220.287156]  show_stack from dump_stack_lvl+0x58/0x70
[  220.293215]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[  220.299984]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[  220.308037]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[  220.316976]  trigger_single_cpu_backtrace from rcu_dump_cpu_stacks+0xf8/0x1ec
[  220.325082]  rcu_dump_cpu_stacks from rcu_sched_clock_irq+0xab8/0xf8c
[  220.332539]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[  220.339919]  update_process_times from tick_sched_handle+0x48/0x54
[  220.347059]  tick_sched_handle from tick_sched_timer+0x48/0xac
[  220.353781]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[  220.361027]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[  220.368390]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[  220.376332]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[  220.385071]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[  220.392548]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[  220.399389]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[  220.406030]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[  220.413261]  generic_handle_arch_irq from __irq_svc+0x90/0xbc
[  220.419847] Exception stack(0xd0001f58 to 0xd0001fa0)
[  220.425568] 1f40:                                                       c01015c8 00000000
[  220.434531] 1f60: 0eaec000 00000000 fffffe00 600f0013 ffffffff d0385d5c 00000000 c3744a80
[  220.443512] 1f80: 00000200 c3744a80 c208dcd8 d0001fa8 c01015c8 c01015d0 600f0113 ffffffff
[  220.452310]  __irq_svc from __do_softirq+0xa0/0x5fc
[  220.458090]  __do_softirq from __irq_exit_rcu+0x138/0x178
[  220.464449]  __irq_exit_rcu from irq_exit+0x8/0x28
[  220.470214]  irq_exit from call_with_stack+0x18/0x20
[  220.476116]  call_with_stack from __irq_svc+0x9c/0xbc
[  220.482007] Exception stack(0xd0385d28 to 0xd0385d70)
[  220.487816] 5d20:                   c208dd04 f9e00488 c2006940 c191a2fc c208dcc0 c208a680
[  220.496775] 5d40: c208dcc0 c191a2fc 00000000 c208dcc0 00000005 c208dcd8 fffffff9 d0385d78
[  220.505644] 5d60: c06d5e5c c06d5c60 600f0013 ffffffff
[  220.511288]  __irq_svc from _omap3_noncore_dpll_lock+0x14/0xc4
[  220.518136]  _omap3_noncore_dpll_lock from omap3_noncore_dpll_program+0x14c/0x5e4
[  220.526717]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[  220.534368]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[  220.541751]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[  220.548696]  clk_set_rate from _set_opp+0x214/0x528
[  220.554436]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[  220.560702]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[  220.568481]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[  220.575706]  od_dbs_update from dbs_work_handler+0x2c/0x60
[  220.582161]  dbs_work_handler from process_one_work+0x284/0x72c
[  220.589012]  process_one_work from worker_thread+0x28/0x4b0
[  220.595530]  worker_thread from kthread+0xe4/0x104
[  220.601208]  kthread from ret_from_fork+0x14/0x28
[  220.606707] Exception stack(0xd0385fb0 to 0xd0385ff8)
[  220.612461] 5fa0:                                     00000000 00000000 00000000 00000000
[  220.621380] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  220.630266] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000

-------------- next part --------------
[   79.751404] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   79.758633]  (detected by 0, t=2602 jiffies, g=4697, q=16429)
[   79.765139] rcu: All QSes seen, last rcu_sched kthread activity 2602 (-22026--24628), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   79.777563] rcu: rcu_sched kthread starved for 2602 jiffies! g4697 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[   79.788374] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[   79.797901] rcu: RCU grace-period kthread stack dump:
[   79.803469] task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00000000
[   79.814789]  __schedule from schedule+0x58/0xcc
[   79.820464]  schedule from schedule_timeout+0x78/0xf8
[   79.826524]  schedule_timeout from rcu_gp_fqs_loop+0x108/0x3d0
[   79.833419]  rcu_gp_fqs_loop from rcu_gp_kthread+0xa8/0x134
[   79.839968]  rcu_gp_kthread from kthread+0xe4/0x104
[   79.845802]  kthread from ret_from_fork+0x14/0x28
[   79.851344] Exception stack(0xd0041fb0 to 0xd0041ff8)
[   79.857137] 1fa0:                                     00000000 00000000 00000000 00000000
[   79.866093] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   79.875005] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   79.882257] rcu: Stack dump where RCU GP kthread last ran:
[   79.888306] NMI backtrace for cpu 0
[   79.892364] CPU: 0 PID: 58 Comm: kworker/0:8 Tainted: G        W         5.18.0-rc7 #14
[   79.901162] Hardware name: Generic AM33XX (Flattened Device Tree)
[   79.907898] Workqueue: events dbs_work_handler
[   79.913341]  unwind_backtrace from show_stack+0x10/0x14
[   79.919588]  show_stack from dump_stack_lvl+0x58/0x70
[   79.925713]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[   79.932520]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[   79.940585]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[   79.949523]  trigger_single_cpu_backtrace from rcu_check_gp_kthread_starvation+0xf4/0x148
[   79.958732]  rcu_check_gp_kthread_starvation from rcu_sched_clock_irq+0xe1c/0xf8c
[   79.967278]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[   79.974688]  update_process_times from tick_sched_handle+0x48/0x54
[   79.981854]  tick_sched_handle from tick_sched_timer+0x48/0xac
[   79.988576]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[   79.995829]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[   80.003205]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[   80.011215]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[   80.019978]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[   80.027458]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[   80.034335]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[   80.040971]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[   80.048196]  generic_handle_arch_irq from __irq_svc+0x90/0xbc
[   80.054809] Exception stack(0xd0001f58 to 0xd0001fa0)
[   80.060516] 1f40:                                                       c01015c8 00000000
[   80.069486] 1f60: 0eaec000 00000000 fffffffe 600f0013 ffffffff d0385d64 016e3600 c3744a80
[   80.078437] 1f80: 00000002 c3744a80 ffffffff d0001fa8 c01015c8 c01015d0 600f0113 ffffffff
[   80.087237]  __irq_svc from __do_softirq+0xa0/0x5fc
[   80.093025]  __do_softirq from __irq_exit_rcu+0x138/0x178
[   80.099460]  __irq_exit_rcu from irq_exit+0x8/0x28
[   80.105230]  irq_exit from call_with_stack+0x18/0x20
[   80.111159]  call_with_stack from __irq_svc+0x9c/0xbc
[   80.117055] Exception stack(0xd0385d30 to 0xd0385d78)
[   80.122806] 5d20:                                     00001901 f9e0042c 00000002 f9e00000
[   80.131771] 5d40: c208dcc0 00000000 c208a680 c191a2fc 016e3600 11e1a300 c1109210 00000000
[   80.140683] 5d60: fffffff9 d0385d80 c06d5d8c c06d30d8 600f0013 ffffffff
[   80.147898]  __irq_svc from clk_memmap_readl+0x28/0x90
[   80.154011]  clk_memmap_readl from omap3_noncore_dpll_program+0x7c/0x5e4
[   80.161766]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[   80.169435]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[   80.176806]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[   80.183777]  clk_set_rate from _set_opp+0x214/0x528
[   80.189541]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[   80.195818]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[   80.203634]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[   80.210877]  od_dbs_update from dbs_work_handler+0x2c/0x60
[   80.217323]  dbs_work_handler from process_one_work+0x284/0x72c
[   80.224217]  process_one_work from worker_thread+0x28/0x4b0
[   80.230730]  worker_thread from kthread+0xe4/0x104
[   80.236422]  kthread from ret_from_fork+0x14/0x28
[   80.241925] Exception stack(0xd0385fb0 to 0xd0385ff8)
[   80.247670] 5fa0:                                     00000000 00000000 00000000 00000000
[   80.256597] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   80.265480] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000

-------------- next part --------------
[  259.990401] rcu: INFO: rcu_sched self-detected stall on CPU
[  259.997260] rcu:     0-...!: (2600 ticks this GP) idle=5af/1/0x40000004 softirq=7041/7041 fqs=0
[  260.006798]  (t=2600 jiffies g=16825 q=11323)
[  260.011833] rcu: rcu_sched kthread timer wakeup didn't happen for 2599 jiffies! g16825 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[  260.023878] rcu:     Possible timer handling issue on cpu=0 timer-softirq=5692
[  260.031436] rcu: rcu_sched kthread starved for 2600 jiffies! g16825 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[  260.042517] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  260.052059] rcu: RCU grace-period kthread stack dump:
[  260.057621] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000
[  260.067142]  __schedule from schedule+0x58/0xcc
[  260.072792]  schedule from schedule_timeout+0x78/0xf8
[  260.078867]  schedule_timeout from rcu_gp_fqs_loop+0x108/0x3d0
[  260.085765]  rcu_gp_fqs_loop from rcu_gp_kthread+0xa8/0x134
[  260.092307]  rcu_gp_kthread from kthread+0xe4/0x104
[  260.098151]  kthread from ret_from_fork+0x14/0x28
[  260.103695] Exception stack(0xd0041fb0 to 0xd0041ff8)
[  260.109490] 1fa0:                                     00000000 00000000 00000000 00000000
[  260.118466] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  260.127383] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  260.134615] rcu: Stack dump where RCU GP kthread last ran:
[  260.140672] NMI backtrace for cpu 0
[  260.144724] CPU: 0 PID: 59 Comm: kworker/0:9 Tainted: G        W         5.18.0-rc7 #14
[  260.153508] Hardware name: Generic AM33XX (Flattened Device Tree)
[  260.160237] Workqueue: events dbs_work_handler
[  260.165684]  unwind_backtrace from show_stack+0x10/0x14
[  260.171933]  show_stack from dump_stack_lvl+0x58/0x70
[  260.178024]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[  260.184833]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[  260.192906]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[  260.201852]  trigger_single_cpu_backtrace from rcu_check_gp_kthread_starvation+0xf4/0x148
[  260.211059]  rcu_check_gp_kthread_starvation from rcu_sched_clock_irq+0xa98/0xf8c
[  260.219589]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[  260.227022]  update_process_times from tick_sched_handle+0x48/0x54
[  260.234180]  tick_sched_handle from tick_sched_timer+0x48/0xac
[  260.240922]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[  260.248166]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[  260.255530]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[  260.263551]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[  260.272314]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[  260.279800]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[  260.286673]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[  260.293323]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[  260.300520]  generic_handle_arch_irq from __irq_svc+0x90/0xbc
[  260.307114] Exception stack(0xd0001f58 to 0xd0001fa0)
[  260.312847] 1f40:                                                       c01015c8 00000000
[  260.321798] 1f60: 0eaec000 00000000 fffffff8 60020013 ffffffff d0389d34 00000000 c3742a40
[  260.330763] 1f80: 00000008 c3742a40 ffffffff d0001fa8 c01015c8 c01015d0 60020113 ffffffff
[  260.339564]  __irq_svc from __do_softirq+0xa0/0x5fc
[  260.345366]  __do_softirq from __irq_exit_rcu+0x138/0x178
[  260.351794]  __irq_exit_rcu from irq_exit+0x8/0x28
[  260.357572]  irq_exit from call_with_stack+0x18/0x20
[  260.363513]  call_with_stack from __irq_svc+0x9c/0xbc
[  260.369403] Exception stack(0xd0389d00 to 0xd0389d48)
[  260.375239] 9d00: 00000005 f9e00488 00000002 f9e00000 00000007 c208dcd8 c191a2fc c208dcc0
[  260.384200] 9d20: 00000000 c208dcc0 00000005 c208dcd8 fffffff9 d0389d50 c06d59ec c06d30d8
[  260.393026] 9d40: 60020013 ffffffff
[  260.397059]  __irq_svc from clk_memmap_readl+0x28/0x90
[  260.403180]  clk_memmap_readl from _omap3_dpll_write_clken+0x24/0x58
[  260.410566]  _omap3_dpll_write_clken from _omap3_noncore_dpll_lock+0x94/0xc4
[  260.418690]  _omap3_noncore_dpll_lock from omap3_noncore_dpll_program+0x14c/0x5e4
[  260.427274]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[  260.434956]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[  260.442340]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[  260.449272]  clk_set_rate from _set_opp+0x214/0x528
[  260.455062]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[  260.461336]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[  260.469146]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[  260.476378]  od_dbs_update from dbs_work_handler+0x2c/0x60
[  260.482821]  dbs_work_handler from process_one_work+0x284/0x72c
[  260.489705]  process_one_work from worker_thread+0x28/0x4b0
[  260.496232]  worker_thread from kthread+0xe4/0x104
[  260.501922]  kthread from ret_from_fork+0x14/0x28
[  260.507432] Exception stack(0xd0389fb0 to 0xd0389ff8)
[  260.513179] 9fa0:                                     00000000 00000000 00000000 00000000
[  260.522110] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  260.531008] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  260.539014] NMI backtrace for cpu 0
[  260.543179] CPU: 0 PID: 59 Comm: kworker/0:9 Tainted: G        W         5.18.0-rc7 #14
[  260.551963] Hardware name: Generic AM33XX (Flattened Device Tree)
[  260.558674] Workqueue: events dbs_work_handler
[  260.564131]  unwind_backtrace from show_stack+0x10/0x14
[  260.570357]  show_stack from dump_stack_lvl+0x58/0x70
[  260.576398]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[  260.583157]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[  260.591223]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[  260.600156]  trigger_single_cpu_backtrace from rcu_dump_cpu_stacks+0xf8/0x1ec
[  260.608284]  rcu_dump_cpu_stacks from rcu_sched_clock_irq+0xab8/0xf8c
[  260.615746]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[  260.623134]  update_process_times from tick_sched_handle+0x48/0x54
[  260.630280]  tick_sched_handle from tick_sched_timer+0x48/0xac
[  260.636991]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[  260.644235]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[  260.651601]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[  260.659558]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[  260.668288]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[  260.675782]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[  260.682627]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[  260.689264]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[  260.696486]  generic_handle_arch_irq from __irq_svc+0x90/0xbc
[  260.703067] Exception stack(0xd0001f58 to 0xd0001fa0)
[  260.708780] 1f40:                                                       c01015c8 00000000
[  260.717753] 1f60: 0eaec000 00000000 fffffff8 60020013 ffffffff d0389d34 00000000 c3742a40
[  260.726702] 1f80: 00000008 c3742a40 ffffffff d0001fa8 c01015c8 c01015d0 60020113 ffffffff
[  260.735511]  __irq_svc from __do_softirq+0xa0/0x5fc
[  260.741288]  __do_softirq from __irq_exit_rcu+0x138/0x178
[  260.747659]  __irq_exit_rcu from irq_exit+0x8/0x28
[  260.753441]  irq_exit from call_with_stack+0x18/0x20
[  260.759337]  call_with_stack from __irq_svc+0x9c/0xbc
[  260.765228] Exception stack(0xd0389d00 to 0xd0389d48)
[  260.771072] 9d00: 00000005 f9e00488 00000002 f9e00000 00000007 c208dcd8 c191a2fc c208dcc0
[  260.780031] 9d20: 00000000 c208dcc0 00000005 c208dcd8 fffffff9 d0389d50 c06d59ec c06d30d8
[  260.788844] 9d40: 60020013 ffffffff
[  260.792883]  __irq_svc from clk_memmap_readl+0x28/0x90
[  260.798946]  clk_memmap_readl from _omap3_dpll_write_clken+0x24/0x58
[  260.806301]  _omap3_dpll_write_clken from _omap3_noncore_dpll_lock+0x94/0xc4
[  260.814421]  _omap3_noncore_dpll_lock from omap3_noncore_dpll_program+0x14c/0x5e4
[  260.823015]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[  260.830702]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[  260.838091]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[  260.845045]  clk_set_rate from _set_opp+0x214/0x528
[  260.850797]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[  260.857071]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[  260.864863]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[  260.872082]  od_dbs_update from dbs_work_handler+0x2c/0x60
[  260.878522]  dbs_work_handler from process_one_work+0x284/0x72c
[  260.885357]  process_one_work from worker_thread+0x28/0x4b0
[  260.891875]  worker_thread from kthread+0xe4/0x104
[  260.897568]  kthread from ret_from_fork+0x14/0x28
[  260.903073] Exception stack(0xd0389fb0 to 0xd0389ff8)
[  260.908814] 9fa0:                                     00000000 00000000 00000000 00000000
[  260.917745] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  260.926641] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000

-------------- next part --------------
[  112.951462] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  112.958658]  (detected by 0, t=2602 jiffies, g=4733, q=11060)
[  112.965167] rcu: All QSes seen, last rcu_sched kthread activity 2602 (-18706--21308), jiffies_till_next_fqs=1, root ->qsmask 0x0
[  112.977570] rcu: rcu_sched kthread starved for 2602 jiffies! g4733 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[  112.988383] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  112.997921] rcu: RCU grace-period kthread stack dump:
[  113.003480] task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00000000
[  113.014832]  __schedule from schedule+0x58/0xcc
[  113.020467]  schedule from schedule_timeout+0x78/0xf8
[  113.026535]  schedule_timeout from rcu_gp_fqs_loop+0x108/0x3d0
[  113.033431]  rcu_gp_fqs_loop from rcu_gp_kthread+0xa8/0x134
[  113.039978]  rcu_gp_kthread from kthread+0xe4/0x104
[  113.045824]  kthread from ret_from_fork+0x14/0x28
[  113.051356] Exception stack(0xd0041fb0 to 0xd0041ff8)
[  113.057147] 1fa0:                                     00000000 00000000 00000000 00000000
[  113.066104] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  113.075023] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  113.082279] rcu: Stack dump where RCU GP kthread last ran:
[  113.088341] NMI backtrace for cpu 0
[  113.092378] CPU: 0 PID: 62 Comm: kworker/0:12 Not tainted 5.18.0-rc7 #14
[  113.099831] Hardware name: Generic AM33XX (Flattened Device Tree)
[  113.106561] Workqueue: events dbs_work_handler
[  113.112021]  unwind_backtrace from show_stack+0x10/0x14
[  113.118272]  show_stack from dump_stack_lvl+0x58/0x70
[  113.124379]  dump_stack_lvl from nmi_cpu_backtrace+0xe0/0x128
[  113.131208]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xec/0x184
[  113.139264]  nmi_trigger_cpumask_backtrace from trigger_single_cpu_backtrace+0x20/0x2c
[  113.148200]  trigger_single_cpu_backtrace from rcu_check_gp_kthread_starvation+0xf4/0x148
[  113.157383]  rcu_check_gp_kthread_starvation from rcu_sched_clock_irq+0xe1c/0xf8c
[  113.165916]  rcu_sched_clock_irq from update_process_times+0x88/0xc0
[  113.173339]  update_process_times from tick_sched_handle+0x48/0x54
[  113.180505]  tick_sched_handle from tick_sched_timer+0x48/0xac
[  113.187240]  tick_sched_timer from __hrtimer_run_queues+0x250/0x4e4
[  113.194469]  __hrtimer_run_queues from hrtimer_interrupt+0x128/0x2c8
[  113.201827]  hrtimer_interrupt from dmtimer_clockevent_interrupt+0x24/0x2c
[  113.209839]  dmtimer_clockevent_interrupt from __handle_irq_event_percpu+0x98/0x334
[  113.218603]  __handle_irq_event_percpu from handle_irq_event+0x38/0xc0
[  113.226084]  handle_irq_event from handle_level_irq+0xb4/0x1a8
[  113.232972]  handle_level_irq from handle_irq_desc+0x1c/0x2c
[  113.239613]  handle_irq_desc from generic_handle_arch_irq+0x2c/0x64
[  113.246842]  generic_handle_arch_irq from call_with_stack+0x18/0x20
[  113.254073]  call_with_stack from __irq_svc+0x9c/0xbc
[  113.259973] Exception stack(0xd0395d40 to 0xd0395d88)
[  113.265824] 5d40: 00000005 f9e00488 00000000 00000000 c208dcc0 00001901 c208a680 c191a2fc
[  113.274781] 5d60: 00000000 c208dcc0 c1109210 c208dcd8 fffffff9 d0395d90 c06d5ef0 c06d6104
[  113.283594] 5d80: 60070013 ffffffff
[  113.287629]  __irq_svc from omap3_noncore_dpll_program+0x3f4/0x5e4
[  113.294907]  omap3_noncore_dpll_program from clk_change_rate+0x238/0x4f8
[  113.302572]  clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c
[  113.309950]  clk_core_set_rate_nolock from clk_set_rate+0x30/0x64
[  113.316908]  clk_set_rate from _set_opp+0x260/0x528
[  113.322680]  _set_opp from dev_pm_opp_set_rate+0xec/0x228
[  113.328969]  dev_pm_opp_set_rate from __cpufreq_driver_target+0x580/0x6fc
[  113.336784]  __cpufreq_driver_target from od_dbs_update+0xb4/0x168
[  113.344024]  od_dbs_update from dbs_work_handler+0x2c/0x60
[  113.350466]  dbs_work_handler from process_one_work+0x284/0x72c
[  113.357326]  process_one_work from worker_thread+0x28/0x4b0
[  113.363841]  worker_thread from kthread+0xe4/0x104
[  113.369532]  kthread from ret_from_fork+0x14/0x28
[  113.375038] Exception stack(0xd0395fb0 to 0xd0395ff8)
[  113.380793] 5fa0:                                     00000000 00000000 00000000 00000000
[  113.389727] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  113.398614] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000



More information about the linux-arm-kernel mailing list