[RFC PATCH v2 16/16] [DO NOT MERGE] x86/kexec: enable DEBUG
Ingo Molnar
mingo at kernel.org
Mon Nov 25 12:34:54 PST 2024
* David Woodhouse <dwmw2 at infradead.org> wrote:
> > Just curious: did you write this code to debug the series, or was
> > there some original hair-tearing regression that motivated you? Is
> > there's an upstream fix to marvel at and be horrified about in
> > equal measure?
>
> https://lore.kernel.org/all/2ab14f6f-2690-056b-cf9e-38a12dafd728@amd.com/t/#u
> is the upstream fix.
Which ended up being the following upstream commit:
88a921aa3c6b ("x86/sev: Ensure that RMP table fixups are reserved")
Might make sense to add this commit reference to one of the central
patches of the GDT/IDT code, to document how this feature is able to
pin down very hard to debug regressions. (Even if the upstream fix was
done independently in probably luckier circumstances.)
> [...] It's all the more horrifying because it was already *fixed*
> upstream before I lost weeks of my life to chasing it. And the
> trigger which actually made it *happen*, and made our production
> systems allocate memory within that dangerous 1MiB region adjacent to
> the RMP table, was a tweak to the NMI watchdog period... leading to
> an assumption that we were getting stray perf NMIs during the kexec,
> and a *long* wild goose chase based on that false assumption...
:-/
> Once I'd written the debug code, I just wanted to clean it up a bit
> and push it out for the benefit of others; that *was* the main point
> of this series. All the rest of the cleanups are just yak shaving.
>
> The realisation that we never even explicitly mapped the control code
> page and always just got lucky because it happened to be in the same
> 2MiB or 1GiB superpage as something else that we did map... was just
> a bonus :)
I'm amazed and horrified in equal measure ;-)
> (That one is fixed in v3 which I'll post shortly, and is already in
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kexec-debug
> )
>
> > I'd argue that this debugging code probably needs a default-off Kconfig
> > option, even with the obvious hard-coded environmental limitations &
> > assumptions it has. Could be useful to very early debugging & would
> > preserve your effort without it bitrotting too obviously.
>
> Yeah. In v3 I've made it a config option, and made it use the
> early_printk serial console (as long as that's an I/O based 8250; we
> can add others too later).
That's lovely!
Thanks,
Ingo
More information about the kexec
mailing list