Lockdep warnings on kexec (virtio_blk, hrtimers)
David Woodhouse
dwmw2 at infradead.org
Fri Dec 13 05:17:48 PST 2024
On Fri, 2024-12-13 at 12:31 +0100, Thomas Gleixner wrote:
> On Fri, Dec 13 2024 at 19:09, Ming Lei wrote:
> > On Fri, Dec 13, 2024 at 11:42:59AM +0100, Thomas Gleixner wrote:
> > > That's the control thread on CPU0. The hotplug thread on CPU1 is stuck
> > > here:
> > >
> > > task:cpuhp/1 state:D stack:0 pid:24 tgid:24 ppid:2 flags:0x00004000
> > > Call Trace:
> > > <TASK>
> > > __schedule+0x51f/0x1a80
> > > schedule+0x3a/0x140
> > > schedule_timeout+0x90/0x110
> > > msleep+0x2b/0x40
> > > blk_mq_hctx_notify_offline+0x160/0x3a0
> > > cpuhp_invoke_callback+0x2a8/0x6c0
> > > cpuhp_thread_fun+0x1ed/0x270
> > > smpboot_thread_fn+0xda/0x1d0
> > >
> > > So something with those blk_mq fixes went sideways.
> >
> > The cpuhp callback is just waiting for inflight IOs to be completed when
> > the irq is still live.
> >
> > It looks same with the following report:
> >
> > https://lore.kernel.org/linux-scsi/F991D40F7D096653+20241203211857.0291ab1b@john-PC/
> >
> > Still triggered in case of kexec & qemu, which should be one qemu
> > problem.
>
> I'd rather say, that's a kexec problem. On the same instance a loop test
> of suspend to ram with pm_test=core just works fine. That's equivalent
> to the kexec scenario. It goes down to syscore_suspend() and skips the
> actual suspend low level magic. It then resumes with syscore_resume()
> and brings the machine back up.
>
> That runs for 2 hours now, while the kexec muck dies within 2
> minutes....
>
> And if you look at the difference of these implementations, you might
> notice that kexec just implemented some rudimentary version of the
> actual suspend logic. Based on let's hope it works that way.
>
> This is just insane and should be rewritten to actually reuse the suspend
> mechanism, which is way better tested than this kexec jump muck.
Not sure it helps for the above linux-scsi issue since that's an
*actual* kexec, not 'kexec jump muck'. But for the kjump this dirty
proof of concept seems to work:
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -19,6 +19,7 @@
#include <linux/gfp.h>
#include <linux/io.h>
#include <linux/kernel.h>
+#include <linux/kexec.h>
#include <linux/list.h>
#include <linux/mm.h>
#include <linux/slab.h>
@@ -446,6 +447,9 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
error = syscore_suspend();
if (!error) {
*wakeup = pm_wakeup_pending();
+ if (kexec_image && kexec_image->preserve_context) {
+ machine_kexec(kexec_image);
+ } else
if (!(suspend_test(TEST_CORE) || *wakeup)) {
trace_suspend_resume(TPS("machine_suspend"),
state, true);
[root at localhost ~]# echo mem > /sys/power/state
[ 61.854085] PM: suspend entry (deep)
[ 61.868380] Filesystems sync: 0.013 seconds
[ 61.873692] Freezing user space processes
[ 61.876739] Freezing user space processes completed (elapsed 0.002 seconds)
[ 61.878175] OOM killer disabled.
[ 61.878861] Freezing remaining freezable tasks
[ 61.880818] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 61.889138] ata2.00: Check power mode failed (err_mask=0x1)
[ 61.893351] PM: suspend devices took 0.011 seconds
[ 61.899373] ACPI: PM: Preparing to enter system sleep state S3
[ 61.900802] ACPI: PM: Saving platform NVS memory
[ 61.901861] Disabling non-boot CPUs ...
[ 61.906841] smpboot: CPU 1 is now offline
Exc:0000000000000003
Err:0000000000000000
rip:0000228e60970000
rax:0000000000000018
rbx:0000000000000000
rcx:0000000000000001
rdx:0000000000000000
rsi:00000000228e6540
rdi:00000000228e4002
r8 :0000000000000000
r9 :0000000022927000
r10:0000000000000000
r11:0000000000000001
r12:0000000000170e70
r13:0000000000170ef0
r14:ffff888006064110
r15:ffff888006f61e20
cr2:00007f408f990098
B[ 61.925154] Enabling non-boot CPUs ...
[ 61.925987] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 61.929886] CPU1 is up
[ 61.930514] ACPI: PM: Waking up from system sleep state S3
[ 61.950391] virtio_blk virtio1: 2/0/0 default/read/poll queues
[ 61.954844] PM: resume devices took 0.020 seconds
[ 61.955968] OOM killer enabled.
[ 61.956500] Restarting tasks ... done.
[ 61.958890] random: crng reseeded on system resumption
[ 61.962280] PM: suspend exit
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20241213/009b392f/attachment-0001.p7s>
More information about the kexec
mailing list