Lockdep warnings on kexec (virtio_blk, hrtimers)
David Woodhouse
dwmw2 at infradead.org
Fri Dec 13 01:31:11 PST 2024
On Fri, 2024-12-13 at 01:14 +0100, Thomas Gleixner wrote:
>
> With that applied the problem goes away, but after a lot of repetitions
> of the reproducer in a tight loop the whole machinery stops dead:
>
> [ 29.913179] Disabling non-boot CPUs ...
> [ 29.930328] smpboot: CPU 1 is now offline
> [ 29.930593] crash hp: kexec_trylock() failed, kdump image may be inaccurate
> B[ 29.940588] Enabling non-boot CPUs ...
> [ 29.940856] crash hp: kexec_trylock() failed, kdump image may be inaccurate
> [ 29.941242] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [ 29.942654] CPU1 is up
> [ 29.945856] virtio_blk virtio1: 2/0/0 default/read/poll queues
> [ 29.948556] OOM killer enabled.
> [ 29.948750] Restarting tasks ... done.
> Success
> [ 29.960044] Freezing user space processes
> [ 29.961447] Freezing user space processes completed (elapsed 0.001 seconds)
> [ 29.961861] OOM killer disabled.
> [ 30.102485] ata2: found unknown device (class 0)
> [ 30.107387] Disabling non-boot CPUs ...
>
> That happens without 'no_console_suspend' on the command line as
> well, but that's for tomorrow ...
I think I saw that lockup once last night too. This morning I did not
see it after hundreds of invocations on my kexec-debug tree (based on
tip/x86/boot which is 6.13-rc1).
I switched to master (231825b2e1 still) and saw it after a few
attempts.
[ 34.250006] Freezing user space processes
[ 34.251930] Freezing user space processes completed (elapsed 0.001 seconds)
[ 34.252730] OOM killer disabled.
[ 34.253141] printk: Suspending console(s) (use no_console_suspend to debug)
(gdb) t a a bt
Thread 2 (Thread 1.2 (CPU#1 [halted ])):
#0 0xffffffff8235886f in pv_native_safe_halt () at arch/x86/kernel/paravirt.c:127
#1 0xffffffff8235b699 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:175
#2 default_idle () at arch/x86/kernel/process.c:742
#3 0xffffffff8235bb0a in default_idle_call () at kernel/sched/idle.c:117
#4 0xffffffff81243195 in cpuidle_idle_call () at kernel/sched/idle.c:185
#5 do_idle () at kernel/sched/idle.c:325
#6 0xffffffff812434b9 in cpu_startup_entry (state=state at entry=CPUHP_AP_ONLINE_IDLE) at kernel/sched/idle.c:423
#7 0xffffffff8115b572 in start_secondary (unused=<optimized out>) at arch/x86/kernel/smpboot.c:314
#8 0xffffffff8110a38d in secondary_startup_64 () at arch/x86/kernel/head_64.S:414
#9 0x0000000000000000 in ?? ()
Thread 1 (Thread 1.1 (CPU#0 [halted ])):
#0 0xffffffff8235886f in pv_native_safe_halt () at arch/x86/kernel/paravirt.c:127
#1 0xffffffff8235b699 in arch_safe_halt () at ./arch/x86/include/asm/paravirt.h:175
#2 default_idle () at arch/x86/kernel/process.c:742
#3 0xffffffff8235bb0a in default_idle_call () at kernel/sched/idle.c:117
#4 0xffffffff81243195 in cpuidle_idle_call () at kernel/sched/idle.c:185
#5 do_idle () at kernel/sched/idle.c:325
#6 0xffffffff812434b9 in cpu_startup_entry (state=state at entry=CPUHP_ONLINE) at kernel/sched/idle.c:423
#7 0xffffffff8235c9c7 in rest_init () at init/main.c:747
#8 0xffffffff8419a694 in start_kernel () at init/main.c:1102
#9 0xffffffff841ac6a4 in x86_64_start_reservations (real_mode_data=real_mode_data at entry=0x147b0 <exception_stacks+34736> <error: Cannot access memory at address 0x147b0>) at arch/x86/kernel/head64.c:507
#10 0xffffffff841ac7fd in x86_64_start_kernel (real_mode_data=0x147b0 <exception_stacks+34736> <error: Cannot access memory at address 0x147b0>) at arch/x86/kernel/head64.c:488
#11 0xffffffff8110a38d in secondary_startup_64 () at arch/x86/kernel/head_64.S:414
#12 0x0000000000000000 in ?? ()
(gdb)
But I haven't ingested your fix yet, maybe so I can't be entirely
surprised if CPU0 scheduled away and ended up in the idle thread?
If I were cleverer I'd remember how to make gdb give me a backtrace for
the process which is actually in the kexec sys_reboot() system call,
instead of the boring idle thread.
(gdb) p sysrq_handle_showstate('t')
That didn't work. Maybe if I'd actually had no_console_suspend on this
boot. Will try again.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20241213/4e52010d/attachment-0001.p7s>
More information about the kexec
mailing list