Lockdep warnings on kexec (virtio_blk, hrtimers)

David Woodhouse dwmw2 at infradead.org
Fri Dec 13 06:07:18 PST 2024


On Fri, 2024-12-13 at 14:23 +0100, Thomas Gleixner wrote:
> On Fri, Dec 13 2024 at 19:48, Ming Lei wrote:
> > On Fri, Dec 13, 2024 at 12:31:24PM +0100, Thomas Gleixner wrote:
> > > I'd rather say, that's a kexec problem. On the same instance a loop test
> > > of suspend to ram with pm_test=core just works fine. That's equivalent
> > > to the kexec scenario. It goes down to syscore_suspend() and skips the
> > > actual suspend low level magic. It then resumes with syscore_resume()
> > > and brings the machine back up.
> > > 
> > > That runs for 2 hours now, while the kexec muck dies within 2
> > > minutes....
> > > 
> > > And if you look at the difference of these implementations, you might
> > > notice that kexec just implemented some rudimentary version of the
> > > actual suspend logic. Based on let's hope it works that way.
> > > 
> > > This is just insane and should be rewritten to actually reuse the suspend
> > > mechanism, which is way better tested than this kexec jump muck.
> > 
> > But kexec is supposed to align with reboot/shutdown, instead of suspend,
> > and it is calling ->shutdown() for notifying driver & device.
> 
> That's only true for the case where the new kernel takes over.
> 
> In the case KEXEC_JUMP=n and kexec_image->preserve_context == true, then
> it is supposed to align with suspend/resume and if you look at the code
> then it actually mimics suspend/resume in the most dilettanteish way.

Did you mean KEXEC_JUMP=y there?

I spent a while the other week trying to understand the case where
CONFIG_KEXEC_JUMP=n and kexec_image->preserve_context=true, and came to
the conclusion that it was a mirage. Userspace can't *actually* set the
KEXEC_PRESERVE_CONTEXT bit when setting up the image, if KEXEC_JUMP=n.

The whole of the code path for that case is dead code. It's confusing
because as discussed elsewhere, we don't just #ifdef out the whole of
that dead code path, but only the bits which don't actually *compile*
(like references to restore_processor_state() etc.).

> It's a patently bad idea to clobber the kernel with kexec jump "fixes"
> instead of using the well tested and established suspend/resume
> machinery.
> 
> All it takes is to:
> 
>     1) disable the wakeup logic
> 
>     2) provide a mechanism to invoke machine_kexec() instead of the
>        actual suspend mechanism.
> 
> No?

Agreed. The hacky proof of concept I posted earlier invoking
machine_kexec() instead of suspend_ops->enter() works fine. I'll look
at cleaning it up and making it not invoke all the ACPI hooks for
*actual* suspend to RAM, etc.

As I noted though, it doesn't address that linux-scsi report which was
a *real* kexec, not a kjump.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20241213/3468ad1d/attachment.p7s>


More information about the kexec mailing list