[PATCH v19 6/7] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()

Eric DeVolder eric.devolder at oracle.com
Fri Mar 17 12:12:20 PDT 2023



On 3/7/23 02:48, Sourabh Jain wrote:
> Hello Eric,
> 
> On 06/03/23 21:52, Eric DeVolder wrote:
>> The function crash_prepare_elf64_headers() generates the elfcorehdr
>> which describes the cpus and memory in the system for the crash kernel.
>> In particular, it writes out ELF PT_NOTEs for memory regions and the
>> cpus in the system.
>>
>> With respect to the cpus, the current implementation utilizes
>> for_each_present_cpu() which means that as cpus are added and removed,
>> the elfcorehdr must again be updated to reflect the new set of cpus.
>>
>> The reasoning behind the change to use for_each_possible_cpu(), is:
>>
>> - At kernel boot time, all percpu crash_notes are allocated for all
>>    possible cpus; that is, crash_notes are not allocated dynamically
>>    when cpus are plugged/unplugged. Thus the crash_notes for each
>>    possible cpu are always available.
>>
>> - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per cpu.
>>    Changing to for_each_possible_cpu() is valid as the crash_notes
>>    pointed to by each cpu PT_NOTE are present and always valid.
>>
>> Furthermore, examining a common crash processing path of:
>>
>>   kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
>>             elfcorehdr      /proc/vmcore     vmcore
>>
>> reveals how the ELF cpu PT_NOTEs are utilized:
>>
>> - Upon panic, each cpu is sent an IPI and shuts itself down, recording
>>   its state in its crash_notes. When all cpus are shutdown, the
>>   crash kernel is launched with a pointer to the elfcorehdr.
>>
>> - The crash kernel via linux/fs/proc/vmcore.c does not examine or
>>   use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.
>>
>> - The makedumpfile utility uses /proc/vmcore and reads the cpu
>>   PT_NOTEs to craft a nr_cpus variable, which is reported in a
>>   header but otherwise generally unused. Makedumpfile creates the
>>   vmcore.
>>
>> - The 'crash' dump analyzer does not appear to reference the cpu
>>   PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
>>   symbols and directly examines those structure contents from vmcore
>>   memory. From that information it is able to determine which cpus
>>   are present and online, and locate the corresponding crash_notes.
>>   Said differently, it appears that 'crash' analyzer does not rely
>>   on the ELF PT_NOTEs for cpus; rather it obtains the information
>>   directly via kernel symbols and the memory within the vmcore.
>>
>> (There maybe other vmcore generating and analysis tools that do use
>> these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
>> common solution.)
>>
>> This change results in the benefit of having all cpus described in
>> the elfcorehdr, and therefore reducing the need to re-generate the
>> elfcorehdr on cpu changes, at the small expense of an additional
>> 56 bytes per PT_NOTE for not-present-but-possible cpus.
>>
>> On systems where kexec_file_load() syscall is utilized, all the above
>> is valid. On systems where kexec_load() syscall is utilized, there
>> may be the need for the elfcorehdr to be regenerated once. The reason
>> being that some archs only populate the 'present' cpus in the
>> /sys/devices/system/cpus entries, which the userspace 'kexec' utility
>> uses to generate the userspace-supplied elfcorehdr. In this situation,
>> one memory or cpu change will rewrite the elfcorehdr via the
>> crash_prepare_elf64_headers() function and now all possible cpus will
>> be described, just as with kexec_file_load() syscall.
>>
>> Suggested-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder at oracle.com>
>> ---
>>   kernel/crash_core.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index dba4b75f7541..537b199a8774 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -365,7 +365,7 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
>>       ehdr->e_phentsize = sizeof(Elf64_Phdr);
>>       /* Prepare one phdr of type PT_NOTE for each present CPU */
> We need to change this comment as well.
Done, thanks!
eric

>> -    for_each_present_cpu(cpu) {
>> +    for_each_possible_cpu(cpu) {
> - Sourabh Jain



More information about the kexec mailing list