[PATCH v22 6/8] crash: hotplug support for kexec_load()

Sourabh Jain sourabhjain at linux.ibm.com
Mon May 8 23:56:23 PDT 2023


On 04/05/23 04:11, Eric DeVolder wrote:
> The hotplug support for kexec_load() requires coordination with
> userspace, and therefore a little extra help from the kernel to
> facilitate the coordination.
>
> In the absence of the solution contained within this particular
> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
> then the crash hotplug logic would find the segment containing the
> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
> generally speaking that is the desired behavior and outcome, a
> problem arises from the fact that if the kdump image includes a
> purgatory that performs a digest checksum, then that check would
> fail (because the elfcorehdr was changed), and the capture kernel
> would fail to boot and no kdump occur.
>
> Therefore, what is needed is for the userspace kexec-tools to
> indicate to the kernel whether or not the supplied kdump image/
> elfcorehdr can be modified (because the kexec-tools excludes the
> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
> appropriately).
>
> To solve these problems, this patch introduces:
>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>     has excluded the elfcorehdr from the digest).
>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>     should be in order to accommodate hotplug changes.
>   - The sysfs crash_hotplug nodes (ie.
>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>     that they examine kexec_file_load() vs kexec_load(), and when
>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>     This is critical so that the udev rule processing of crash_hotplug
>     indicates correctly (ie. the userspace unload-then-load of the
>     kdump of the kdump image can be skipped, or not).
>
> With this patch in place, I believe the following statements to be true
> (with local testing to verify):
>
>   - For systems which have these kernel changes in place, but not the
>     corresponding changes to the crash hot plug udev rules and
>     kexec-tools, (ie "older" systems) those systems will continue to
>     unload-then-load the kdump image, as has always been done. The
>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>   - For systems which have these kernel changes in place and the proposed
>     udev rule changes in place, but not the kexec-tools changes in place:
>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>        so the unload-then-reload of kdump image will occur (the sysfs
>        crash_hotplug nodes will show 0).
>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>        to show 1, and the kernel will modify the elfcorehdr directly. And
>        with the udev changes in place, the unload-then-load will not occur!
>   - For systems which have these kernel changes as well as the udev and
>     kexec-tools changes in place, then the user/admin has full authority
>     over the enablement and support of crash hotplug support, whether via
>     kexec_file_load() or kexec_load().
>
> Said differently, as kexec_load() was/is widely in use, these changes
> permit it to continue to be used as-is (retaining the current unload-then-
> reload behavior) until such time as the udev and kexec-tools changes can
> be rolled out as well.
>
> I've intentionally kept the changes related to userspace coordination
> for kexec_load() separate as this need was identified late; the
> rest of this series has been generally reviewed and accepted. Once
> this support has been vetted, I can refactor if needed.
>
> Suggested-by: Hari Bathini <hbathini at linux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder at oracle.com>
> ---
>   arch/x86/include/asm/kexec.h | 11 +++++++----
>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>   include/linux/kexec.h        | 14 ++++++++++++--
>   include/uapi/linux/kexec.h   |  1 +
>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>   kernel/kexec.c               |  3 +++
>   kernel/ksysfs.c              | 15 +++++++++++++++
>   7 files changed, 96 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 9143100ea3ea..3be6a98751f0 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> -static inline int crash_hotplug_cpu_support(void) { return 1; }
> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
> +int arch_crash_hotplug_cpu_support(void);
> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>   #endif
>   
>   #ifdef CONFIG_MEMORY_HOTPLUG
> -static inline int crash_hotplug_memory_support(void) { return 1; }
> -#define crash_hotplug_memory_support crash_hotplug_memory_support
> +int arch_crash_hotplug_memory_support(void);
> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>   #endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 0c9d496cf7ce..8064e65de6c0 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/* These functions provide the value for the sysfs crash_hotplug nodes */
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_crash_hotplug_cpu_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int arch_crash_hotplug_memory_support(void)
> +{
> +	return crash_check_update_elfcorehdr();
> +}
> +#endif
> +
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
> +	else
> +		sz += 2 + CONFIG_NR_CPUS_DEFAULT;

If the sz holds the garbage value we have issues in else part. Or if you 
expecting
sz to be 0 then += is not needed in the else part.

How to doing this way?

unsigned int sz;

sz = 2 + CONFIG_NR_CPUS_DEFAULT;

if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
     sz += CRASH_MAX_MEMORY_RANGES


Thanks,
Sourabh Jain

> +	sz *= sizeof(Elf64_Phdr);
> +	return sz;
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 6a8a724ac638..050e20066cdb 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -335,6 +335,10 @@ struct kimage {
>   	unsigned int preserve_context : 1;
>   	/* If set, we are using file mode kexec syscall */
>   	unsigned int file_mode:1;
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	/* If set, allow changes to elfcorehdr of kexec_load'd image */
> +	unsigned int update_elfcorehdr:1;
> +#endif
>   
>   #ifdef ARCH_HAS_KIMAGE_ARCH
>   	struct kimage_arch arch;
> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>   
>   /* List of defined/legal kexec flags */
>   #ifndef CONFIG_KEXEC_JUMP
> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>   #else
> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>   #endif
>   
>   /* List of defined/legal kexec file flags */
> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>   #endif
>   
> +int crash_check_update_elfcorehdr(void);
> +
>   #ifndef crash_hotplug_cpu_support
>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>   #endif
> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>   static inline int crash_hotplug_memory_support(void) { return 0; }
>   #endif
>   
> +#ifndef crash_get_elfcorehdr_size
> +static inline crash_get_elfcorehdr_size(void) { return 0; }
> +#endif
> +
>   #else /* !CONFIG_KEXEC_CORE */
>   struct pt_regs;
>   struct task_struct;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 981016e05cfa..01766dd839b0 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,7 @@
>   /* kexec flags for different usage scenarios */
>   #define KEXEC_ON_CRASH		0x00000001
>   #define KEXEC_PRESERVE_CONTEXT	0x00000002
> +#define KEXEC_UPDATE_ELFCOREHDR	0x00000004
>   #define KEXEC_ARCH_MASK		0xffff0000
>   
>   /*
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ef6e91daad56..e05bfdb7eaed 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
> +
> +/*
> + * This routine utilized when the crash_hotplug sysfs node is read.
> + * It reflects the kernel's ability/permission to update the crash
> + * elfcorehdr directly.
> + */
> +int crash_check_update_elfcorehdr(void)
> +{
> +	int rc = 0;
> +
> +	/* Obtain lock while reading crash information */
> +	if (!kexec_trylock()) {
> +		pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
> +		return 0;
> +	}
> +	if (kexec_crash_image) {
> +		if (kexec_crash_image->file_mode)
> +			rc = 1;
> +		else
> +			rc = kexec_crash_image->update_elfcorehdr;
> +	}
> +	/* Release lock now that update complete */
> +	kexec_unlock();
> +
> +	return rc;
> +}
> +
>   /*
>    * To accurately reflect hot un/plug changes of cpu and memory resources
>    * (including onling and offlining of those resources), the elfcorehdr
> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   
>   	image = kexec_crash_image;
>   
> +	/* Check that updating elfcorehdr is permitted */
> +	if (!(image->file_mode || image->update_elfcorehdr))
> +		goto out;
> +
>   	if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>   		hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>   		pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 92d301f98776..60de64bd14b9 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>   	if (flags & KEXEC_PRESERVE_CONTEXT)
>   		image->preserve_context = 1;
>   
> +	if (flags & KEXEC_UPDATE_ELFCOREHDR)
> +		image->update_elfcorehdr = 1;
> +
>   	ret = machine_kexec_prepare(image);
>   	if (ret)
>   		goto out;
> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> index aad7a3bfd846..1d4bc493b2f4 100644
> --- a/kernel/ksysfs.c
> +++ b/kernel/ksysfs.c
> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>   }
>   KERNEL_ATTR_RO(vmcoreinfo);
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
> +			       struct kobj_attribute *attr, char *buf)
> +{
> +	unsigned int sz = crash_get_elfcorehdr_size();
> +
> +	return sysfs_emit(buf, "%u\n", sz);
> +}
> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
> +
> +#endif
> +
>   #endif /* CONFIG_CRASH_CORE */
>   
>   /* whether file capabilities are enabled */
> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>   #endif
>   #ifdef CONFIG_CRASH_CORE
>   	&vmcoreinfo_attr.attr,
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	&crash_elfcorehdr_size_attr.attr,
> +#endif
>   #endif
>   #ifndef CONFIG_TINY_RCU
>   	&rcu_expedited_attr.attr,



More information about the kexec mailing list