Lockdep warnings on kexec (virtio_blk, hrtimers)
Thomas Gleixner
tglx at linutronix.de
Thu Dec 12 05:34:48 PST 2024
CC+ printk folks
On Thu, Dec 12 2024 at 11:07, David Woodhouse wrote:
> On Wed, 2024-12-11 at 07:42 -0500, Stefan Hajnoczi wrote:
>> On Tue, Dec 10, 2024 at 09:56:43AM +0800, Jason Wang wrote:
>> > Adding more virtio-blk people here.
>>
>> Please try Ming Lei's recent fix in Jens' tree:
>>
>> virtio-blk: don't keep queue frozen during system suspend
>> commit: 7678abee0867e6b7fb89aa40f6e9f575f755fb37
>>
>> https://git.kernel.dk/cgit/linux/commit/?h=block-6.13&id=7678abee0867e6b7fb89aa40f6e9f575f755fb37
>
> Thanks. That does make those warnings go away. I do still get this one
> occasionally though. It seems to go away without 'no_console_suspend'
> on the command line, but I'm not sure that makes it OK.
Not really.
> [ 23.665790] Interrupts enabled after irqrouter_resume+0x0/0x50
The resume callback irqrouter_resume() returns with interrupts enabled,
but it's absolutely unclear where this happens. The lockdep tracking is
not really helpful:
> [ 23.697043] hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
> [ 23.697855] hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90
__up_console_sem()
{
printk_safe_enter_irqsave(flags); // Assuming this is __up_console_sem+0x63/0x90
// Saves state in @flags and disables interrupts
up(&console_sem);
printk_safe_exit_irqrestore(flags); // Assuming this is __up_console_sem+0x7e/0x90
// Restores the interrupt state from @flags
}
Though the events are in reverse order:
last enabled at 15573
last disabled at 15580
At event #15573 printk_safe_exit_irqrestore(flags) enabled interrupts,
which means the preceeding printk_safe_enter_irqsave(flags) was invoked
with interrupts enabled. But that enable event wiped the real culprit,
which enabled interrupts before __up_console_sem() was invoked.
At event #15580 printk_safe_enter_irqsave(flags); disables interrupts
again, which is probably at the point where printk() dumps the bug, but
I might be misreading this.
Now David's observation that the problem "goes away" when he adds
"no_console_suspend" on the command line is definitely interesting, but
does not really help in figuring out the root cause.
> [ 23.698673] softirqs last enabled at (14798): [<ffffffffa81c6c12>] __irq_exit_rcu+0xe2/0x100
> [ 23.699481] softirqs last disabled at (14777): [<ffffffffa81c6c12>] __irq_exit_rcu+0xe2/0x100
> [ 23.700284] ---[ end trace 0000000000000000 ]---
> [ 23.702460] ------------[ cut here ]------------
> [ 23.702963] WARNING: CPU: 0 PID: 560 at kernel/time/hrtimer.c:995 hrtimers_resume_local+0x29/0x40
This one is just a consequence of the above.
David, can you retest with the debug patch below? That should pin-point
the real culprit.
Thanks,
tglx
---
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -621,6 +621,9 @@ do { \
extern void lockdep_assert_in_softirq_func(void);
+extern void lockdep_suspend_syscore_enter(void);
+extern void lockdep_suspend_syscore_exit(void);
+
#else
# define might_lock(lock) do { } while (0)
# define might_lock_read(lock) do { } while (0)
@@ -635,6 +638,8 @@ extern void lockdep_assert_in_softirq_fu
# define lockdep_assert_preemption_disabled() do { } while (0)
# define lockdep_assert_in_softirq() do { } while (0)
# define lockdep_assert_in_softirq_func() do { } while (0)
+static inline void lockdep_suspend_syscore_enter(void) { }
+static inline void lockdep_suspend_syscore_exit(void) { }
#endif
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1025,6 +1025,7 @@ int kernel_kexec(void)
if (error)
goto Enable_cpus;
local_irq_disable();
+ lockdep_suspend_syscore_enter();
error = syscore_suspend();
if (error)
goto Enable_irqs;
@@ -1054,6 +1055,7 @@ int kernel_kexec(void)
if (kexec_image->preserve_context) {
syscore_resume();
Enable_irqs:
+ lockdep_suspend_syscore_exit();
local_irq_enable();
Enable_cpus:
suspend_enable_secondary_cpus();
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4408,6 +4408,18 @@ void lockdep_hardirqs_on_prepare(void)
}
EXPORT_SYMBOL_GPL(lockdep_hardirqs_on_prepare);
+static bool suspend_syscore_active;
+
+void noinstr lockdep_suspend_syscore_enter(void)
+{
+ suspend_syscore_active = true;
+}
+
+void noinstr lockdep_suspend_syscore_exit(void)
+{
+ suspend_syscore_active = false;
+}
+
void noinstr lockdep_hardirqs_on(unsigned long ip)
{
struct irqtrace_events *trace = ¤t->irqtrace;
@@ -4456,6 +4468,8 @@ void noinstr lockdep_hardirqs_on(unsigne
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return;
+ DEBUG_LOCKS_WARN_ON(suspend_syscore_active);
+
/*
* Ensure the lock stack remained unchanged between
* lockdep_hardirqs_on_prepare() and lockdep_hardirqs_on().
More information about the kexec
mailing list