[PATCH v22 6/8] crash: hotplug support for kexec_load()

Baoquan He bhe at redhat.com
Sun May 7 22:13:33 PDT 2023


On 05/03/23 at 06:41pm, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
> 
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
> 
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
> 
> To solve these problems, this patch introduces:
>  - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>    safe for the kernel to modify the elfcorehdr (because kexec-tools
>    has excluded the elfcorehdr from the digest).
>  - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>    kexec-tools what the preferred size of the elfcorehdr memory buffer
>    should be in order to accommodate hotplug changes.
>  - The sysfs crash_hotplug nodes (ie.
>    /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>    that they examine kexec_file_load() vs kexec_load(), and when
>    kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>    This is critical so that the udev rule processing of crash_hotplug
>    indicates correctly (ie. the userspace unload-then-load of the
>    kdump of the kdump image can be skipped, or not).
> 
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
> 
>  - For systems which have these kernel changes in place, but not the
>    corresponding changes to the crash hot plug udev rules and
>    kexec-tools, (ie "older" systems) those systems will continue to
>    unload-then-load the kdump image, as has always been done. The
>    kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>  - For systems which have these kernel changes in place and the proposed
>    udev rule changes in place, but not the kexec-tools changes in place:
>     - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>       so the unload-then-reload of kdump image will occur (the sysfs
>       crash_hotplug nodes will show 0).
>     - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>       to show 1, and the kernel will modify the elfcorehdr directly. And
>       with the udev changes in place, the unload-then-load will not occur!
>  - For systems which have these kernel changes as well as the udev and
>    kexec-tools changes in place, then the user/admin has full authority
>    over the enablement and support of crash hotplug support, whether via
>    kexec_file_load() or kexec_load().
> 
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
> 
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
> 
> Suggested-by: Hari Bathini <hbathini at linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder at oracle.com>

LGTM,

Acked-by: Baoquan He <bhe at redhat.com>




More information about the kexec mailing list