Lockdep warnings on kexec (virtio_blk, hrtimers)

David Woodhouse dwmw2 at infradead.org
Fri Dec 13 05:17:48 PST 2024


On Fri, 2024-12-13 at 12:31 +0100, Thomas Gleixner wrote:
> On Fri, Dec 13 2024 at 19:09, Ming Lei wrote:
> > On Fri, Dec 13, 2024 at 11:42:59AM +0100, Thomas Gleixner wrote:
> > > That's the control thread on CPU0. The hotplug thread on CPU1 is stuck
> > > here:
> > > 
> > >  task:cpuhp/1         state:D stack:0     pid:24    tgid:24    ppid:2      flags:0x00004000
> > >  Call Trace:
> > >   <TASK>
> > >   __schedule+0x51f/0x1a80
> > >   schedule+0x3a/0x140
> > >   schedule_timeout+0x90/0x110
> > >   msleep+0x2b/0x40
> > >   blk_mq_hctx_notify_offline+0x160/0x3a0
> > >   cpuhp_invoke_callback+0x2a8/0x6c0
> > >   cpuhp_thread_fun+0x1ed/0x270
> > >   smpboot_thread_fn+0xda/0x1d0
> > > 
> > > So something with those blk_mq fixes went sideways.
> > 
> > The cpuhp callback is just waiting for inflight IOs to be completed when
> > the irq is still live.
> > 
> > It looks same with the following report:
> > 
> > https://lore.kernel.org/linux-scsi/F991D40F7D096653+20241203211857.0291ab1b@john-PC/
> > 
> > Still triggered in case of kexec & qemu, which should be one qemu
> > problem.
> 
> I'd rather say, that's a kexec problem. On the same instance a loop test
> of suspend to ram with pm_test=core just works fine. That's equivalent
> to the kexec scenario. It goes down to syscore_suspend() and skips the
> actual suspend low level magic. It then resumes with syscore_resume()
> and brings the machine back up.
> 
> That runs for 2 hours now, while the kexec muck dies within 2
> minutes....
> 
> And if you look at the difference of these implementations, you might
> notice that kexec just implemented some rudimentary version of the
> actual suspend logic. Based on let's hope it works that way.
> 
> This is just insane and should be rewritten to actually reuse the suspend
> mechanism, which is way better tested than this kexec jump muck.

Not sure it helps for the above linux-scsi issue since that's an
*actual* kexec, not 'kexec jump muck'. But for the kjump this dirty
proof of concept seems to work:

--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -19,6 +19,7 @@
 #include <linux/gfp.h>
 #include <linux/io.h>
 #include <linux/kernel.h>
+#include <linux/kexec.h>
 #include <linux/list.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
@@ -446,6 +447,9 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
        error = syscore_suspend();
        if (!error) {
                *wakeup = pm_wakeup_pending();
+               if (kexec_image && kexec_image->preserve_context) {
+                       machine_kexec(kexec_image);
+               } else
                if (!(suspend_test(TEST_CORE) || *wakeup)) {
                        trace_suspend_resume(TPS("machine_suspend"),
                                state, true);


[root at localhost ~]# echo mem > /sys/power/state
[   61.854085] PM: suspend entry (deep)
[   61.868380] Filesystems sync: 0.013 seconds
[   61.873692] Freezing user space processes
[   61.876739] Freezing user space processes completed (elapsed 0.002 seconds)
[   61.878175] OOM killer disabled.
[   61.878861] Freezing remaining freezable tasks
[   61.880818] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[   61.889138] ata2.00: Check power mode failed (err_mask=0x1)
[   61.893351] PM: suspend devices took 0.011 seconds
[   61.899373] ACPI: PM: Preparing to enter system sleep state S3
[   61.900802] ACPI: PM: Saving platform NVS memory
[   61.901861] Disabling non-boot CPUs ...
[   61.906841] smpboot: CPU 1 is now offline
Exc:0000000000000003
Err:0000000000000000
rip:0000228e60970000
rax:0000000000000018
rbx:0000000000000000
rcx:0000000000000001
rdx:0000000000000000
rsi:00000000228e6540
rdi:00000000228e4002
r8 :0000000000000000
r9 :0000000022927000
r10:0000000000000000
r11:0000000000000001
r12:0000000000170e70
r13:0000000000170ef0
r14:ffff888006064110
r15:ffff888006f61e20
cr2:00007f408f990098
B[   61.925154] Enabling non-boot CPUs ...
[   61.925987] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   61.929886] CPU1 is up
[   61.930514] ACPI: PM: Waking up from system sleep state S3
[   61.950391] virtio_blk virtio1: 2/0/0 default/read/poll queues
[   61.954844] PM: resume devices took 0.020 seconds
[   61.955968] OOM killer enabled.
[   61.956500] Restarting tasks ... done.
[   61.958890] random: crng reseeded on system resumption
[   61.962280] PM: suspend exit

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20241213/009b392f/attachment-0001.p7s>


More information about the kexec mailing list