[PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

Paul E. McKenney paulmck at kernel.org
Fri Jul 29 08:28:19 PDT 2022


Or better yet, try the patch that Rafael proposed.  ;-)

							Thanx, Paul

On Fri, Jul 29, 2022 at 08:26:22AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 29, 2022 at 03:24:58AM -0700, Michel Lespinasse wrote:
> > On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > > > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > > > Xeons") wrecked intel_idle in two ways:
> > > > > 
> > > > >  - must not have tracing in idle functions
> > > > >  - must return with IRQs disabled
> > > > > 
> > > > > Additionally, it added a branch for no good reason.
> > > > > 
> > > > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz at infradead.org>
> > > > 
> > > > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > > > usage" when booting a kernel with debug options compiled in. Please
> > > > see the attached dmesg output. The issue starts with commit 32d4fd5751ea
> > > > and is still present in v5.19-rc8.
> > > > 
> > > > I'm not sure, is this too late to fix or revert in v5.19 final ?
> > > 
> > > I finally got a chance to take a quick look at this.
> > > 
> > > The rcu_eqs_exit() function is making a lockdep complaint about
> > > being invoked with interrupts enabled.  This function is called from
> > > rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
> > > via its call to rcu_idle_exit().  Except that rcu_idle_exit() disables
> > > interrupts before invoking rcu_eqs_exit().
> > > 
> > > The only other call to rcu_idle_exit() does not disable interrupts,
> > > but it is via rcu_user_exit(), which would be a very odd choice for
> > > cpuidle_enter_state().
> > > 
> > > It seems unlikely, but it might be that it is the use of local_irq_save()
> > > instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> > > the trouble.  If this is the case, then the commit shown below would
> > > help.  Note that this commit removes the warning from lockdep, so it
> > > is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> > > equivalent debugging.
> > > 
> > > Could you please try your test with the -rce commit shown below applied?
> > 
> > Thanks for looking into it.
> 
> And thank you for trying this shot in the dark!
> 
> > After checking out Peter's commit 32d4fd5751ea,
> > cherry picking your commit ed4ae5eff4b3,
> > and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
> > I am now seeing this a few seconds into the boot:
> > 
> > [    3.010650] ------------[ cut here ]------------
> > [    3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 sched_clock_tick+0x27/0x60
> 
> And this is again a complaint about interrupts not being disabled.
> 
> But it does appear that the problem was the lockdep complaint, and
> eliminating that did take care of part of the problem.  But lockdep
> remained enabled, and you therefore hit the next complaint.
> 
> > [    3.010657] Modules linked in:
> > [    3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-test-00005-g1be22fea0611 #1
> > [    3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
> > [    3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
> 
> The most straightforward way to get to sched_clock_tick() from
> cpuidle_enter_state() is via the call to sched_clock_idle_wakeup_event().
> 
> Except that it disables interrupts before invoking sched_clock_tick().
> 
> > [    3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 74 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 <0f> 0b e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
> >  89 c0 48 03 1c c5 c0 98
> > [    3.010667] RSP: 0000:ffffffffb2803e28 EFLAGS: 00010002
> > [    3.010670] RAX: 0000000000000001 RBX: ffffc8ce7fa07060 RCX: 0000000000000001
> > [    3.010671] RDX: 0000000000000000 RSI: ffffffffb268dd21 RDI: ffffffffb269ab13
> > [    3.010673] RBP: 0000000000000001 R08: ffffffffffc300d5 R09: 000000000002be80
> > [    3.010674] R10: 000003625b53183a R11: ffffa012b802b7a4 R12: ffffffffb2aa9e80
> > [    3.010675] R13: ffffffffb2aa9e00 R14: 0000000000000001 R15: 0000000000000000
> > [    3.010677] FS:  0000000000000000(0000) GS:ffffa012b8000000(0000) knlGS:0000000000000000
> > [    3.010678] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    3.010680] CR2: ffffa012f81ff000 CR3: 0000000c99612001 CR4: 00000000003706f0
> > [    3.010681] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    3.010682] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    3.010683] Call Trace:
> > [    3.010685]  <TASK>
> > [    3.010688]  cpuidle_enter_state+0xb7/0x4b0
> > [    3.010694]  cpuidle_enter+0x29/0x40
> > [    3.010697]  do_idle+0x1d4/0x210
> > [    3.010702]  cpu_startup_entry+0x19/0x20
> > [    3.010704]  rest_init+0x117/0x1a0
> > [    3.010708]  arch_call_rest_init+0xa/0x10
> > [    3.010711]  start_kernel+0x6d8/0x6ff
> > [    3.010716]  secondary_startup_64_no_verify+0xce/0xdb
> > [    3.010728]  </TASK>
> > [    3.010729] irq event stamp: 44179
> > [    3.010730] hardirqs last  enabled at (44179): [<ffffffffb2000ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [    3.010734] hardirqs last disabled at (44177): [<ffffffffb22003f0>] __do_softirq+0x3f0/0x498
> > [    3.010736] softirqs last  enabled at (44178): [<ffffffffb2200332>] __do_softirq+0x332/0x498
> > [    3.010738] softirqs last disabled at (44171): [<ffffffffb16c760b>] irq_exit_rcu+0xab/0xf0
> > [    3.010741] ---[ end trace 0000000000000000 ]---
> 
> Would you be willing to try another shot in the dark, but untested
> this time?  I freely admit that this is getting strange.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index e374c0c923dae..279f557bf60bb 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -394,7 +394,7 @@ notrace void sched_clock_tick(void)
>  	if (!static_branch_likely(&sched_clock_running))
>  		return;
>  
> -	lockdep_assert_irqs_disabled();
> +	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
>  
>  	scd = this_scd();
>  	__scd_stamp(scd);



More information about the linux-um mailing list