[PATCH v2 0/6] crashdump: Kernel handling of CPU and memory hot un/plug

Hari Bathini hbathini at linux.ibm.com
Wed May 3 23:19:54 PDT 2023



On 04/05/23 3:46 am, Eric DeVolder wrote:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr, which describes the CPUs and memory
> in the system, must also be updated, else the resulting vmcore is
> inaccurate (eg. missing either CPU context or memory regions).
> 
> The current solution (eg. RHEL /usr/lib/udev/rules.d/98-kexec.rules)
> utilizes udev to initiate an unload-then-reload of the *entire* kdump
> image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
> the userspace kexec utility. In a previous kernel patch post I have
> outlined the significant performance problems related to offloading
> this activity to userspace.
> 
> As such, I've been working to provide the ability for the Linux kernel
> to directly modify the elfcorehdr in response to hotplug changes.
> 
>   https://lore.kernel.org/lkml/20230404180326.6890-1-eric.devolder@oracle.com/
> 
> The series listed above is v21, and the v22 contains changes that
> work in concert with the v2 changes cited within. (I'm posting the
> kexec-tools changes first so I can reference them in the kernel v22
> posting.)
> 
> I believe this work to be nearing the finish line. As such, I'd like
> to start posting the kexec-tools userspace changes for review in order
> to minimize the time to adoption.
> 
> This kexec-tools patch series is for supporting the kexec_load
> syscall only. The kernel patch series cited above is self-contained
> for the kexec_file_load syscall, requiring no userspace help.
> 
> There are two basic obstacles/requirements for the kexec-tools to
> overcome in order to support kernel hotplug rewriting of the
> elfcorehdr.
> 
> First, the buffer containing the elfcorehdr must be excluded from the
> purgatory checksum/digest, which is computed at load time. Otherwise
> kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
> would result in the checksum failing (specifically in purgatory at
> panic kernel boot time), and kdump capture kernel failing to start.
> To let the kernel know it is okay to modify the elfcorehdr, kexec
> sets the KEXEC_UPDATE_ELFCOREHDR flag.
> 
> NOTE: The kernel specifically does *NOT* attempt to recompute the
> checksum/digest as that would ultimately require patching the in-
> memory purgatory image with the updated checksum. As that purgatory
> image is already fully linked, it is binary blob containing no ELF
> information which would allow it to be re-linked or patched. Thus
> excluding the elfcorehdr from the checksum/digests avoids all these
> problems.
> 
> Second, the size of the elfcorehdr buffer must be large enough
> to accomodate growth of the number of CPUs and/or memory regions.
> 
> To satisfy the first requirement, this patch series introduces the
> --hotplug option to indicate to kexec-tools that kexec should exclude
> the elfcorehdr buffer from the purgatory checksum/digest calculation
> and set the KEXEC_UPDATE_ELFCOREHDR flag.
> 
> To satisfy the second requirement, the size is obtained from the
> (proposed in the kernel series above)
> /sys/kernel/crash_elfcorehdr_size node, or it can be specified
> manually with new --elfcorehdrsz= option.
> 
> I am intentionally posting this series before the kernel changes
> have been merged. I'm hoping to facilitate discussion as to how
> kexec-tools wants to handle the soon-to-be new kernel feature.
> 
> Discussion items:
> 
> - It is worth noting, that deploying kexec-tools, with this series
>    included, on kernels that do NOT have the kernel hotplug series
>    cited above, is safe to do. The result of running a kernel without
>    hotplug elfcorehdr support with kexec-tools and the --hotplug option
>    simply removes the elfcorehdr buffer from the digest. This does not
>    prevent kdump from operating; the only risk being a slight chance of
>    corruption of the elfcorehdr, as it now not covered by the checksum.
>    Using the --elfcorehdrsz option on a kernel without hotplug
>    elfcorehdr support simply results in a possibly oversized buffer for
>    the elfcorehdr, there is no harm in that.
> 
> - While I currently have the --hotplug as an option, the option could
>    be eliminated (or reversed polarity) it would be safe to *always*
>    omit the elfcorehdr from the checksum/digest for purgatory.
>    If this were the case, then distros would not have to make any
>    changes to kdump scripts to pass the --hotplug option. Then, when
>    their kernel does include the kernel patch series cited above,
>    kdump and hotplug would "just work".
> 
> - I'm unsure if these options should be kept as common/global
>    kexec options, or moved to arch options.
> 
> - I'm only showing x86 support (and testing) at this time, but
>    it would be straight forward to provide similar support for the
>    other architectures in a future patch revision.

True. Should be straightforward to add similar support for other
architectures. For example, powerpc would need another flag
KEXEC_UPDATE_FDT on top of the flag to update elfcorehdr.

Looks good to me. For the series..

Acked-by: Hari Bathini <hbathini at linux.ibm.com>

> Thanks!
> eric
> 
> ---
> v2: 3may2023
>   - Setting KEXEC_UPDATE_ELFCOREHDR flag
>   - Utilizing /sys/kernel/crash_elfcorehdr_size info.
> 
> v1: 20oct2022
>   http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>   - Initial patch series
> 
> RFC:
>   https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
>   s/vmcoreinfo/elfcorehdr/g
> ---
> 
> 
> Eric DeVolder (6):
>    kexec: define KEXEC_UPDATE_ELFCOREHDR
>    crashdump: introduce the hotplug command line options
>    crashdump: setup hotplug support
>    crashdump: exclude elfcorehdr segment from digest for hotplug
>    crashdump/x86: identify elfcorehdr segment for hotplug
>    crashdump/x86: set the elfcorehdr segment size for hotplug
> 
>   kexec/arch/i386/crashdump-x86.c |  8 ++++++
>   kexec/kexec-syscall.h           |  1 +
>   kexec/kexec.c                   | 45 +++++++++++++++++++++++++++++++++
>   kexec/kexec.h                   | 10 +++++++-
>   4 files changed, 63 insertions(+), 1 deletion(-)
> 



More information about the kexec mailing list