rcu self-detected stall messages on OMAP3, 4 boards

Fri Sep 21 15:11:14 EDT 2012

On Fri, 21 Sep 2012, Paul E. McKenney wrote:

> On Fri, Sep 21, 2012 at 06:08:59PM +0000, Paul Walmsley wrote:
> 
> > As far as I know, our only idle entry point is in 
> > arch/arm/common/process.c:cpu_idle().
> 
> In mainline, this is arch/arm/kernel/process.c, correct?

Indeed; sorry about that, mistyped.

> > Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside 
> > {stop,start}_critical_timings().  Making that change here didn't help.
> 
> The reason x86 does this is that they have idle notifiers deeper in the
> idle loop that use RCU read-side critical sections.  So this was an
> expected result.

OK

> > Also tried commenting out the code from the stop_critical_timings() call 
> > to the WARN_ON(irqs_disabled()), and adding a local_irq_enable().  That 
> > also didn't help, which suggests that the problem is not caused by the 
> > OMAP-specific PM idle code.
> 
> I must admit that you make a convincing case here.  Though it does leave
> me wondering what is different about Panda (and MX28, IIRC).

Given the dependency on CONFIG_NO_HZ, the stalls are probably dependent on 
the userspace in use.  The userspaces here are quite minimal and so allow 
the system to stay idle for relatively long periods of time.

> I may take your advice of remote access to a Panda board, though that
> is likely to take a bit of time due to timezones.  Regardless of the
> underlying issue here, I clearly need to make the stall-warning messages
> do a better job of printing out needed information.

If you've got a patch in mind for that, I'll boot it here.

One other observation.  omap2plus_defconfig sets CONFIG_NO_HZ=y but 
doesn't set CONFIG_RCU_FAST_NO_HZ.  The stall warning messages still 
appear when CONFIG_RCU_FAST_NO_HZ=y.  One of them is attached below (with 
CONFIG_RCU_CPU_STALL_INFO set as well, obviously).

As an aside, in the CONFIG_RCU_FAST_NO_HZ=y build, I dropped a printk() 
into rcu_idle_gp_timer_func() and it doesn't look like it ever executed.

- Paul

[  305.832000] INFO: rcu_sched self-detected stall on CPU
[  305.834838]  1: (2 GPs behind) idle=5b1/1/0 drain=0 . timer=4294967295
[  305.838378]   (t=17463 jiffies)
[  305.840118] [<c001be10>] (unwind_backtrace+0x0/0xf0) from [<c00ad65c>] (rcu_pending+0xd0/0x540)
[  305.844848] [<c00ad65c>] (rcu_pending+0xd0/0x540) from [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198)
[  305.849884] [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198) from [<c0053800>] (update_process_times+0x38/0x68)
[  305.855285] [<c0053800>] (update_process_times+0x38/0x68) from [<c008cf40>] (tick_sched_timer+0x80/0xec)
[  305.860473] [<c008cf40>] (tick_sched_timer+0x80/0xec) from [<c006942c>] (__run_hrtimer+0x7c/0x1e0)
[  305.865356] [<c006942c>] (__run_hrtimer+0x7c/0x1e0) from [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0)
[  305.870361] [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0) from [<c001a54c>] (twd_handler+0x30/0x44)
[  305.875274] [<c001a54c>] (twd_handler+0x30/0x44) from [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c)
[  305.880371] [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c) from [<c00a489c>] (generic_handle_irq+0x30/0x48)
[  305.885833] [<c00a489c>] (generic_handle_irq+0x30/0x48) from [<c0014fb8>] (handle_IRQ+0x4c/0xac)
[  305.890624] [<c0014fb8>] (handle_IRQ+0x4c/0xac) from [<c000864c>] (gic_handle_irq+0x28/0x5c)
[  305.895233] [<c000864c>] (gic_handle_irq+0x28/0x5c) from [<c04fbc64>] (__irq_svc+0x44/0x5c)
[  305.899780] Exception stack(0xde86ff88 to 0xde86ffd0)
[  305.902526] ff80:                   0004c062 00000001 00000000 de8660c0 de86e000 c07c42c8
[  305.906982] ffa0: c05075a0 c074bdd0 00000000 411fc092 c074bff0 00000000 00000001 de86ffd0
[  305.911437] ffc0: 0004c063 c00152b0 20000113 ffffffff
[  305.914184] [<c04fbc64>] (__irq_svc+0x44/0x5c) from [<c00152b0>] (default_idle+0x20/0x44)
[  305.918640] [<c00152b0>] (default_idle+0x20/0x44) from [<c00154dc>] (cpu_idle+0x9c/0x114)
[  305.923126] [<c00154dc>] (cpu_idle+0x9c/0x114) from [<804f4a34>] (0x804f4a34)