[PATCH v3 0/6] crashdump: Kernel handling of CPU and memory hot un/plug
Eric DeVolder
eric.devolder at oracle.com
Wed Sep 27 11:11:30 PDT 2023
When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr, which describes the CPUs and memory
in the system, must also be updated, else the resulting vmcore is
inaccurate (eg. missing either CPU context or memory regions).
The current solution utilizes udev (eg. RHEL /usr/lib/udev/rules.d/
98-kexec.rules) to initiate an unload-then-reload of the *entire* kdump
image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
the userspace kexec utility. This occurrs just so the elfcorehdr can
be updated with the latest list of CPUs and memory regions. In a
previous post I have outlined the significant performance problems
related to offloading this activity to userspace.
With the Linux kernel 6.6 commit below, the kernel now has the ability
to directly modify the elfcorehdr, eliminating the need to
unload-then-reload the entire kdump image when CPU or memory is hot
un/plugged or on/offlined.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6
8b4b6f307d155475cce541f2aee938032ed22e
This kexec-tools patch series is for supporting hotplug with the
kexec_load() syscall; the kernel directly supports hotplug for the
kexec_file_load() syscall, requiring no userspace help.
There are two basic obstacles/requirements for the kexec-tools to
overcome in order to support kernel hotplug rewriting of the
elfcorehdr.
First, the buffer containing the elfcorehdr must be excluded from the
purgatory checksum/digest, which is computed at load time. Otherwise
kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
would result in the checksum failing (specifically in purgatory at
panic kernel boot time), and kdump capture kernel failing to start.
To let the kernel know it is okay to modify the elfcorehdr, kexec
sets the KEXEC_UPDATE_ELFCOREHDR flag.
NOTE: The kernel specifically does *NOT* attempt to recompute the
checksum/digest as that would ultimately require patching the in-
memory purgatory image with the updated checksum. As that purgatory
image is already fully linked, it is binary blob containing no ELF
information which would allow it to be re-linked or patched. Thus
excluding the elfcorehdr from the checksum/digests avoids all these
problems.
Second, the size of the elfcorehdr buffer must be large enough
to accomodate growth of the number of CPUs and/or memory regions.
To satisfy the first requirement, this patch series introduces the
--hotplug option to indicate to kexec-tools that kexec should exclude
the elfcorehdr buffer from the purgatory checksum/digest calculation
and set the KEXEC_UPDATE_ELFCOREHDR flag.
To satisfy the second requirement, the size is obtained from the
/sys/kernel/crash_elfcorehdr_size node (new with the kernel series
cited above).
To use this feature with kexec_load() syscall, invoke kexec with:
kexec -c --hotplug ...
Thanks!
eric
---
v3: 27sep2023
- Cite the merged Linux 6.6 commit that supports crash hotplug.
- Removed the --elfcorehdrsz option, instead using the the
/sys/kernel/crash_elfcorehdr_size node from the new kernel
crash hotplug feature.
v2: 3may2023
http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
- Setting KEXEC_UPDATE_ELFCOREHDR flag
- Utilizing /sys/kernel/crash_elfcorehdr_size info.
v1: 20oct2022
http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
- Initial patch series
RFC:
https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
s/vmcoreinfo/elfcorehdr/g
---
Eric DeVolder (6):
kexec: define KEXEC_UPDATE_ELFCOREHDR
crashdump: introduce the hotplug command line options
crashdump: setup general hotplug support
crashdump: exclude elfcorehdr segment from digest for hotplug
crashdump/x86: identify elfcorehdr segment for hotplug
crashdump/x86: set the elfcorehdr segment size for hotplug
kexec/arch/i386/crashdump-x86.c | 11 +++++++++++
kexec/kexec-syscall.h | 1 +
kexec/kexec.8 | 6 ++++++
kexec/kexec.c | 32 ++++++++++++++++++++++++++++++++
kexec/kexec.h | 8 +++++++-
5 files changed, 57 insertions(+), 1 deletion(-)
--
2.39.3
More information about the kexec
mailing list