[PATCH v18 3/7] crash: add generic infrastructure for crash hotplug support

Eric DeVolder eric.devolder at oracle.com
Fri Feb 10 08:51:14 PST 2023



On 2/9/23 13:10, Sourabh Jain wrote:
> Hello Eric,
> 
> On 01/02/23 04:12, Eric DeVolder wrote:
>> To support crash hotplug, a mechanism is needed to update the crash
>> elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
>> onlining).
>>
>> To track CPU changes, callbacks are registered with the cpuhp
>> mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
>> crash hotplug elfcorehdr update has no explicit ordering requirement
>> (relative to other cpuhp states), so meets the criteria for
>> utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
>> state and avoids the need to introduce a new state for crash
>> hotplug. Also, this is the last state in the PREPARE group, just
>> prior to the STARTING group, which is very close to the CPU
>> starting up in an plug/online situation, or stopping in a unplug/
>> offline situation. This minimizes the window of time during an
>> actual plug/online or unplug/offline situation in which the
>> elfcorehdr would be inaccurate.
>>
>> Note, that when a CPU is being unplugged/offlined, the CPU is still
>> in the foreach_present_cpu() during the regeneration of the
>> elfcorehdr. Thus there is a need to explicitly check and exclude
>> the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
>> cpu from elfcorehdr notes'.
>>
>> To track memory changes, a notifier is registered to capture the
>> memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>
>> The cpu callbacks and memory notifiers invoke handle_hotplug_event()
>> which performs needed tasks and then dispatches the event to the
>> architecture specific arch_crash_handle_hotplug_event() to update the
>> elfcorehdr with the current state of CPUs and memory. During the
>> process, the kexec_lock is held.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder at oracle.com>
>> Acked-by: Baoquan He <bhe at redhat.com>
>> ---
>>   include/linux/crash_core.h |   9 +++
>>   include/linux/kexec.h      |  12 ++++
>>   kernel/crash_core.c        | 139 +++++++++++++++++++++++++++++++++++++
>>   3 files changed, 160 insertions(+)
>>
>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>> index de62a722431e..ed868d237c07 100644
>> --- a/include/linux/crash_core.h
>> +++ b/include/linux/crash_core.h
>> @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>>   int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>>           unsigned long long *crash_size, unsigned long long *crash_base);
>> +#define KEXEC_CRASH_HP_NONE            0
>> +#define KEXEC_CRASH_HP_REMOVE_CPU        1
>> +#define KEXEC_CRASH_HP_ADD_CPU            2
>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY        3
>> +#define KEXEC_CRASH_HP_ADD_MEMORY        4
>> +#define KEXEC_CRASH_HP_INVALID_CPU        -1U
>> +
>> +struct kimage;
>> +
>>   #endif /* LINUX_CRASH_CORE_H */
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 27ef420c7a45..a52624ae4452 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
>>   #include <linux/compat.h>
>>   #include <linux/ioport.h>
>>   #include <linux/module.h>
>> +#include <linux/highmem.h>
>>   #include <asm/kexec.h>
>>   /* Verify architecture specific macros are defined */
>> @@ -371,6 +372,13 @@ struct kimage {
>>       struct purgatory_info purgatory_info;
>>   #endif
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    int hp_action;
>> +    unsigned int offlinecpu;
>> +    bool elfcorehdr_index_valid;
>> +    int elfcorehdr_index;
> 
> May be I am reiterating myself but I think we can manage without elfcorehdr_index_valid.
> 
> Here is how:
> Initialize the elfcorehdr_index with a negative value in do_kimage_alloc_init
> function (it is called for both kexec_load and kexec_file_load).
> 
> Now when the control reaches to handle_hotplug_event function and if elfcorehdr_index
> has negative value find the correct index and re-initialize the elfcorehdr_index.
> 
> Thoughts?
> 
> Thanks,
> Sourabh Jain
> 
ok, I'll eliminate elfcorehdr_index_valid.
eric



More information about the kexec mailing list