cpuidle on i.MX8MQ

Abel Vesa abel.vesa at nxp.com
Fri Dec 3 00:38:31 PST 2021


On 21-11-29 14:40:04, Martin Kepplinger wrote:
> Am Donnerstag, dem 04.11.2021 um 13:04 +0200 schrieb Abel Vesa:
> > On 21-11-03 13:09:15, Martin Kepplinger wrote:
> > > Am Dienstag, dem 02.11.2021 um 11:55 +0100 schrieb Alexander Stein:
> > > > Hello,
> > > > 
> > > > I was hit by the errata e11171 on imx8mq on our custom board. I
> > > > found
> > > > [1] from over 2 years ago, and the even older patchset [2].
> > > > Is there some final conclusion or fix regarding this errata? From
> > > > what
> > > > I understand the proposed change is apparently not acceptable in
> > > > mainline for several reasons. I'm wondering what's the current
> > > > status.
> > 
> > Unfortunately, there is not gonna be an upstream solution for this
> > errata. Long story short, the SOC is missing wakeup lines from gic
> > to gpc. This means the IPIs are affected. So, knowing all that,
> > in order to wake up a core, you need to write a bit in some register
> > in gpc. The SW workaround (non upstreamable) I provided does exactly
> > that by hijacking the gic_raise_softirq __smp_cross_call handler and
> > registers a wrapper over it which also calls into ATF (using SIP)
> > and wakes up that specific core by writing into the gpc register.
> > 
> > There is no other possible way to wake up a core on 8MQ.
> > 
> > > > As suggested at that time, the only solution (right now) is to
> > > > disable
> > > > cpuidle on imx8mq?
> > > > 
> > 
> > Yes, the vendor actually suggests that, but you can use the mentioned
> > hack.
> > 
> > > > Best regards,
> > > > Alexander
> > > > 
> > > > [1] https://lkml.org/lkml/2019/6/10/350
> > > > [2] https://lkml.org/lkml/2019/3/27/542
> > > > 
> > > 
> > > Hi Alexander, hi Abel,
> > > 
> > > At this point my understanding is basically the same. We carry (a
> > > slight variation of) the above in our tree ever since in oder to
> > > have
> > > the cpu-sleep sleep state. Not using it is not acceptable to us :)
> > > 
> > > Until now there's one internal API change we need to revert (bring
> > > back) in order for this to work. For reference, this is our current
> > > implementation:
> > > 
> > > https://source.puri.sm/martin.kepplinger/linux-next/-/compare/0b90c3622755e0155632d8cc25edd4eb7f875968...ce4803745a180adc8d87891d4ff8dff1c7bd5464
> > > 
> > > Abel, can you still say that, in case this solution won't apply
> > > anymore
> > > in the future, that you would be available to create an update?
> > > 
> > 
> > I'll try to find a workaround soon, based on the same general idea
> > behind the current one you guys are using. I'll do this in my own
> > time
> > since the company does not allocate resources for 8MQ cpuidle support
> > anymore.
> > 
> > > Can you even imagine a possibly acceptable solution for mainline to
> > > this? Nothing is completely set in stone with Linux :)
> > 
> > I believe Marc was pretty clear about not accepting such a workaround
> > (and, TBH, it makes perfect sense not to).
> > 
> > Since I don't think there is any other way that would go around the
> > gic driver, I believe this has hit an end when it comes to upstream
> > support.
> > 
> > Sorry about that.
> > 
> > I'm open to any suggestions though.
> > 
> > 
> 
> hi Abel, since there's the link to the workaround implementation here,
> I'd like to show you a bug when transitioning to s2idle. I don't see
> that when removing all these cpu-sleep additions (linked above). (I
> might send as a seperate bugreport later)
> 
> Can you see how that can cause this rcu stall? it looks like a problem
> with a timer...
> 

Looks to me like some core is not getting up. 

You can start by hacking the irqchip driver to see if the ATF call
is still being made after s2idle is triggered. If yes, then make
sure the ATF is writing the GPC reg to wake up that specific core.

That's usually what's going wrong with this workaround.

> 
>  65.476456] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 65.476615] rcu: 0-...!: (1 ticks this GP)
> idle=42f/1/0x4000000000000004 softirq=9151/9151 fqs=0 
> [ 65.476676] (t=8974 jiffies g=11565 q=2)
> [ 65.476703] rcu: rcu_preempt kthread timer wakeup didn't happen for
> 8973 jiffies! g11565 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> [ 65.476715] rcu: Possible timer handling issue on cpu=0 timer-
> softirq=2032
> [ 65.476730] rcu: rcu_preempt kthread starved for 8974 jiffies! g11565
> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
> [ 65.476742] rcu: Unless rcu_preempt kthread gets sufficient CPU time,
> OOM is now expected behavior.
> [ 65.476749] rcu: RCU grace-period kthread stack dump:
> [ 65.476764] task:rcu_preempt state:I stack: 0 pid: 13 ppid: 2
> flags:0x00000008
> [ 65.476814] Call trace:
> [ 65.476825] __switch_to+0x138/0x190
> [ 65.476975] __schedule+0x288/0x6ec
> [ 65.477044] schedule+0x7c/0x110
> [ 65.477059] schedule_timeout+0xa4/0x1c4
> [ 65.477085] rcu_gp_fqs_loop+0x13c/0x51c
> [ 65.477126] rcu_gp_kthread+0x1a4/0x264
> [ 65.477136] kthread+0x15c/0x170
> [ 65.477167] ret_from_fork+0x10/0x20
> [ 65.477186] rcu: Stack dump where RCU GP kthread last ran:
> [ 65.477194] Task dump for CPU 0:
> [ 65.477202] task:swapper/0 state:R running task stack: 0 pid: 0 ppid:
> 0 flags:0x0000000a
> [ 65.477223] Call trace:
> [ 65.477226] dump_backtrace+0x0/0x1e4
> [ 65.477246] show_stack+0x24/0x30
> [ 65.477256] sched_show_task+0x15c/0x180
> [ 65.477293] dump_cpu_task+0x50/0x60
> [ 65.477327] rcu_check_gp_kthread_starvation+0x128/0x148
> [ 65.477335] rcu_sched_clock_irq+0xb74/0xf04
> [ 65.477348] update_process_times+0xa8/0xf4
> [ 65.477388] tick_sched_handle+0x3c/0x60
> [ 65.477409] tick_sched_timer+0x58/0xb0
> [ 65.477416] __hrtimer_run_queues+0x18c/0x370
> [ 65.477428] hrtimer_interrupt+0xf4/0x250
> [ 65.477437] arch_timer_handler_phys+0x40/0x50
> [ 65.477477] handle_percpu_devid_irq+0x94/0x250
> [ 65.477505] handle_domain_irq+0x6c/0xa0
> [ 65.477516] gic_handle_irq+0xc4/0x144
> [ 65.477527] call_on_irq_stack+0x2c/0x54
> [ 65.477534] do_interrupt_handler+0x5c/0x70
> [ 65.477544] el1_interrupt+0x30/0x80
> [ 65.477556] el1h_64_irq_handler+0x18/0x24
> [ 65.477567] el1h_64_irq+0x78/0x7c
> [ 65.477575] cpuidle_enter_s2idle+0x14c/0x1ac
> [ 65.477617] do_idle+0x25c/0x2a0
> [ 65.477644] cpu_startup_entry+0x30/0x80
> [ 65.477656] rest_init+0xec/0x100
> [ 65.477666] arch_call_rest_init+0x1c/0x28
> [ 65.477700] start_kernel+0x6e0/0x720
> [ 65.477709] __primary_switched+0xc0/0xc8
> [ 65.477751] Task dump for CPU 0:
> [ 65.477757] task:swapper/0 state:R running task stack: 0 pid: 0 ppid:
> 0 flags:0x0000000a
> [ 65.477770] Call trace:
> [ 65.477773] dump_backtrace+0x0/0x1e4
> [ 65.477788] show_stack+0x24/0x30
> [ 65.477796] sched_show_task+0x15c/0x180
> [ 65.477804] dump_cpu_task+0x50/0x60
> [ 65.477812] rcu_dump_cpu_stacks+0xf4/0x138
> [ 65.477820] rcu_sched_clock_irq+0xb78/0xf04
> [ 65.477829] update_process_times+0xa8/0xf4
> [ 65.477838] tick_sched_handle+0x3c/0x60
> [ 65.477845] tick_sched_timer+0x58/0xb0
> [ 65.477854] __hrtimer_run_queues+0x18c/0x370
> [ 65.477863] hrtimer_interrupt+0xf4/0x250
> [ 65.477873] arch_timer_handler_phys+0x40/0x50
> [ 65.477880] handle_percpu_devid_irq+0x94/0x250
> [ 65.477888] handle_domain_irq+0x6c/0xa0
> [ 65.477897] gic_handle_irq+0xc4/0x144
> [ 65.477903] call_on_irq_stack+0x2c/0x54
> [ 65.477910] do_interrupt_handler+0x5c/0x70
> [ 65.477921] el1_interrupt+0x30/0x80
> [ 65.477929] el1h_64_irq_handler+0x18/0x24
> [ 65.477937] el1h_64_irq+0x78/0x7c
> [ 65.477944] cpuidle_enter_s2idle+0x14c/0x1ac
> [ 65.477952] do_idle+0x25c/0x2a0
> [ 65.477959] cpu_startup_entry+0x30/0x80
> [ 65.477970] rest_init+0xec/0x100
> [ 65.477977] arch_call_rest_init+0x1c/0x28
> [ 65.477988] start_kernel+0x6e0/0x720
> [ 65.477995] __primary_switched+0xc0/0xc8
> 
>



More information about the linux-arm-kernel mailing list