[PATCH] sched: Prevent rescheduling when interrupts are disabled
David Woodhouse
dwmw2 at infradead.org
Mon Dec 16 09:41:27 PST 2024
On 16 December 2024 13:20:56 GMT, Thomas Gleixner <tglx at linutronix.de> wrote:
>David reported a warning observed while loop testing kexec jump:
>
> Interrupts enabled after irqrouter_resume+0x0/0x50
> WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220
> kernel_kexec+0xf6/0x180
> __do_sys_reboot+0x206/0x250
> do_syscall_64+0x95/0x180
>
>The corresponding interrupt flag trace:
>
> hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
> hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90
>
>That means __up_console_sem() was invoked with interrupts enabled. Further
>instrumentation revealed that in the interrupt disabled section of kexec
>jump one of the syscore_suspend() callbacks woke up a task, which set the
>NEED_RESCHED flag. A later callback in the resume path invoked
>cond_resched() which in turn led to the invocation of the scheduler:
>
> __cond_resched+0x21/0x60
> down_timeout+0x18/0x60
> acpi_os_wait_semaphore+0x4c/0x80
> acpi_ut_acquire_mutex+0x3d/0x100
> acpi_ns_get_node+0x27/0x60
> acpi_ns_evaluate+0x1cb/0x2d0
> acpi_rs_set_srs_method_data+0x156/0x190
> acpi_pci_link_set+0x11c/0x290
> irqrouter_resume+0x54/0x60
> syscore_resume+0x6a/0x200
> kernel_kexec+0x145/0x1c0
> __do_sys_reboot+0xeb/0x240
> do_syscall_64+0x95/0x180
>
>This is a long standing problem, which probably got more visible with
>the recent printk changes. Something does a task wakeup and the
>scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and
>invokes schedule() from a completely bogus context. The scheduler
>enables interrupts after context switching, which causes the above
>warning at the end.
>
>Quite some of the code paths in syscore_suspend()/resume() can result in
>triggering a wakeup with the exactly same consequences. They might not
>have done so yet, but as they share a lot of code with normal operations
>it's just a question of time.
>
>The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling
>models. Full preemption is not affected as cond_resched() is disabled and
>the preemption check preemptible() takes the interrupt disabled flag into
>account.
>
>Cure the problem by adding a corresponding check into cond_resched().
>
>Reported-by: David Woodhouse <dwmw2 at infradead.org>
>Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
>Tested-by: David Woodhouse <dwmw2 at infradead.org>
>Cc: stable at vger.kernel.org
>Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@infradead.org
>---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -7276,7 +7276,7 @@ void rt_mutex_setprio(struct task_struct
> #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
> int __sched __cond_resched(void)
> {
>- if (should_resched(0)) {
>+ if (should_resched(0) && !irqs_disabled()) {
> preempt_schedule_common();
> return 1;
> }
Thank you. Slight preference for dwmw at amazon.co.uk as the Reported-by and Tested-by addresses if it's not too late. I'm assuming you will handle this and don't want me to round it up with the kexec bits.
More information about the kexec
mailing list