[PATCH v18 3/7] crash: add generic infrastructure for crash hotplug support

Thu Feb 9 11:10:10 PST 2023

Hello Eric,

On 01/02/23 04:12, Eric DeVolder wrote:
> To support crash hotplug, a mechanism is needed to update the crash
> elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
> onlining).
>
> To track CPU changes, callbacks are registered with the cpuhp
> mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
> crash hotplug elfcorehdr update has no explicit ordering requirement
> (relative to other cpuhp states), so meets the criteria for
> utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
> state and avoids the need to introduce a new state for crash
> hotplug. Also, this is the last state in the PREPARE group, just
> prior to the STARTING group, which is very close to the CPU
> starting up in an plug/online situation, or stopping in a unplug/
> offline situation. This minimizes the window of time during an
> actual plug/online or unplug/offline situation in which the
> elfcorehdr would be inaccurate.
>
> Note, that when a CPU is being unplugged/offlined, the CPU is still
> in the foreach_present_cpu() during the regeneration of the
> elfcorehdr. Thus there is a need to explicitly check and exclude
> the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
> cpu from elfcorehdr notes'.
>
> To track memory changes, a notifier is registered to capture the
> memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>
> The cpu callbacks and memory notifiers invoke handle_hotplug_event()
> which performs needed tasks and then dispatches the event to the
> architecture specific arch_crash_handle_hotplug_event() to update the
> elfcorehdr with the current state of CPUs and memory. During the
> process, the kexec_lock is held.
>
> Signed-off-by: Eric DeVolder <eric.devolder at oracle.com>
> Acked-by: Baoquan He <bhe at redhat.com>
> ---
>   include/linux/crash_core.h |   9 +++
>   include/linux/kexec.h      |  12 ++++
>   kernel/crash_core.c        | 139 +++++++++++++++++++++++++++++++++++++
>   3 files changed, 160 insertions(+)
>
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e..ed868d237c07 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>   int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>   		unsigned long long *crash_size, unsigned long long *crash_base);
>   
> +#define KEXEC_CRASH_HP_NONE			0
> +#define KEXEC_CRASH_HP_REMOVE_CPU		1
> +#define KEXEC_CRASH_HP_ADD_CPU			2
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY		3
> +#define KEXEC_CRASH_HP_ADD_MEMORY		4
> +#define KEXEC_CRASH_HP_INVALID_CPU		-1U
> +
> +struct kimage;
> +
>   #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 27ef420c7a45..a52624ae4452 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
>   #include <linux/compat.h>
>   #include <linux/ioport.h>
>   #include <linux/module.h>
> +#include <linux/highmem.h>
>   #include <asm/kexec.h>
>   
>   /* Verify architecture specific macros are defined */
> @@ -371,6 +372,13 @@ struct kimage {
>   	struct purgatory_info purgatory_info;
>   #endif
>   
> +#ifdef CONFIG_CRASH_HOTPLUG
> +	int hp_action;
> +	unsigned int offlinecpu;
> +	bool elfcorehdr_index_valid;
> +	int elfcorehdr_index;

May be I am reiterating myself but I think we can manage without 
elfcorehdr_index_valid.

Here is how:
Initialize the elfcorehdr_index with a negative value in 
do_kimage_alloc_init
function (it is called for both kexec_load and kexec_file_load).

Now when the control reaches to handle_hotplug_event function and if 
elfcorehdr_index
has negative value find the correct index and re-initialize the 
elfcorehdr_index.

Thoughts?

Thanks,
Sourabh Jain