[PATCH v2 0/6] crashdump: Kernel handling of CPU and memory hot un/plug
Hari Bathini
hbathini at linux.ibm.com
Wed May 3 23:19:54 PDT 2023
On 04/05/23 3:46 am, Eric DeVolder wrote:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr, which describes the CPUs and memory
> in the system, must also be updated, else the resulting vmcore is
> inaccurate (eg. missing either CPU context or memory regions).
>
> The current solution (eg. RHEL /usr/lib/udev/rules.d/98-kexec.rules)
> utilizes udev to initiate an unload-then-reload of the *entire* kdump
> image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
> the userspace kexec utility. In a previous kernel patch post I have
> outlined the significant performance problems related to offloading
> this activity to userspace.
>
> As such, I've been working to provide the ability for the Linux kernel
> to directly modify the elfcorehdr in response to hotplug changes.
>
> https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
>
> The series listed above is v21, and the v22 contains changes that
> work in concert with the v2 changes cited within. (I'm posting the
> kexec-tools changes first so I can reference them in the kernel v22
> posting.)
>
> I believe this work to be nearing the finish line. As such, I'd like
> to start posting the kexec-tools userspace changes for review in order
> to minimize the time to adoption.
>
> This kexec-tools patch series is for supporting the kexec_load
> syscall only. The kernel patch series cited above is self-contained
> for the kexec_file_load syscall, requiring no userspace help.
>
> There are two basic obstacles/requirements for the kexec-tools to
> overcome in order to support kernel hotplug rewriting of the
> elfcorehdr.
>
> First, the buffer containing the elfcorehdr must be excluded from the
> purgatory checksum/digest, which is computed at load time. Otherwise
> kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
> would result in the checksum failing (specifically in purgatory at
> panic kernel boot time), and kdump capture kernel failing to start.
> To let the kernel know it is okay to modify the elfcorehdr, kexec
> sets the KEXEC_UPDATE_ELFCOREHDR flag.
>
> NOTE: The kernel specifically does *NOT* attempt to recompute the
> checksum/digest as that would ultimately require patching the in-
> memory purgatory image with the updated checksum. As that purgatory
> image is already fully linked, it is binary blob containing no ELF
> information which would allow it to be re-linked or patched. Thus
> excluding the elfcorehdr from the checksum/digests avoids all these
> problems.
>
> Second, the size of the elfcorehdr buffer must be large enough
> to accomodate growth of the number of CPUs and/or memory regions.
>
> To satisfy the first requirement, this patch series introduces the
> --hotplug option to indicate to kexec-tools that kexec should exclude
> the elfcorehdr buffer from the purgatory checksum/digest calculation
> and set the KEXEC_UPDATE_ELFCOREHDR flag.
>
> To satisfy the second requirement, the size is obtained from the
> (proposed in the kernel series above)
> /sys/kernel/crash_elfcorehdr_size node, or it can be specified
> manually with new --elfcorehdrsz= option.
>
> I am intentionally posting this series before the kernel changes
> have been merged. I'm hoping to facilitate discussion as to how
> kexec-tools wants to handle the soon-to-be new kernel feature.
>
> Discussion items:
>
> - It is worth noting, that deploying kexec-tools, with this series
> included, on kernels that do NOT have the kernel hotplug series
> cited above, is safe to do. The result of running a kernel without
> hotplug elfcorehdr support with kexec-tools and the --hotplug option
> simply removes the elfcorehdr buffer from the digest. This does not
> prevent kdump from operating; the only risk being a slight chance of
> corruption of the elfcorehdr, as it now not covered by the checksum.
> Using the --elfcorehdrsz option on a kernel without hotplug
> elfcorehdr support simply results in a possibly oversized buffer for
> the elfcorehdr, there is no harm in that.
>
> - While I currently have the --hotplug as an option, the option could
> be eliminated (or reversed polarity) it would be safe to *always*
> omit the elfcorehdr from the checksum/digest for purgatory.
> If this were the case, then distros would not have to make any
> changes to kdump scripts to pass the --hotplug option. Then, when
> their kernel does include the kernel patch series cited above,
> kdump and hotplug would "just work".
>
> - I'm unsure if these options should be kept as common/global
> kexec options, or moved to arch options.
>
> - I'm only showing x86 support (and testing) at this time, but
> it would be straight forward to provide similar support for the
> other architectures in a future patch revision.
True. Should be straightforward to add similar support for other
architectures. For example, powerpc would need another flag
KEXEC_UPDATE_FDT on top of the flag to update elfcorehdr.
Looks good to me. For the series..
Acked-by: Hari Bathini <hbathini at linux.ibm.com>
> Thanks!
> eric
>
> ---
> v2: 3may2023
> - Setting KEXEC_UPDATE_ELFCOREHDR flag
> - Utilizing /sys/kernel/crash_elfcorehdr_size info.
>
> v1: 20oct2022
> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
> - Initial patch series
>
> RFC:
> https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
> s/vmcoreinfo/elfcorehdr/g
> ---
>
>
> Eric DeVolder (6):
> kexec: define KEXEC_UPDATE_ELFCOREHDR
> crashdump: introduce the hotplug command line options
> crashdump: setup hotplug support
> crashdump: exclude elfcorehdr segment from digest for hotplug
> crashdump/x86: identify elfcorehdr segment for hotplug
> crashdump/x86: set the elfcorehdr segment size for hotplug
>
> kexec/arch/i386/crashdump-x86.c | 8 ++++++
> kexec/kexec-syscall.h | 1 +
> kexec/kexec.c | 45 +++++++++++++++++++++++++++++++++
> kexec/kexec.h | 10 +++++++-
> 4 files changed, 63 insertions(+), 1 deletion(-)
>
More information about the kexec
mailing list