[PATCH v18 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

Sourabh Jain sourabhjain at linux.ibm.com
Sun Aug 4 21:30:49 PDT 2024


Hello Jinjie,

On 05/08/24 07:58, Jinjie Ruan wrote:
>
> On 2024/3/26 13:54, Sourabh Jain wrote:
>> Commit 247262756121 ("crash: add generic infrastructure for crash
>> hotplug support") added a generic infrastructure that allows
>> architectures to selectively update the kdump image component during CPU
>> or memory add/remove events within the kernel itself.
>>
>> This patch series adds crash hotplug handler for PowerPC and enable
>> support to update the kdump image on CPU/Memory add/remove events.
>>
>> Among the 6 patches in this series, the first two patches make changes
>> to the generic crash hotplug handler to assist PowerPC in adding support
>> for this feature. The last four patches add support for this feature.
>>
>> The following section outlines the problem addressed by this patch
>> series, along with the current solution, its shortcomings, and the
>> proposed resolution.
>>
>> Problem:
>> ========
>> Due to CPU/Memory hotplug or online/offline events the elfcorehdr
>> (which describes the CPUs and memory of the crashed kernel) and FDT
>> (Flattened Device Tree) of kdump image becomes outdated. Consequently,
>> attempting dump collection with an outdated elfcorehdr or FDT can lead
>> to failed or inaccurate dump collection.
> Hi, Sourabh, are there any specific methods to reproduce the scenarios
> for this feature? I would like to port this feature to ARM64, but I
> don't know how to reproduce the issue.

On PowerPC, this issue is reproducible if the kernel crashes after bulk CPU
or memory hotplug operations. Try the same on ARM64; you might be
able to reproduce the issue.

Note: I used to hotplug hundreds of LMBs (Logical Memory Blocks),
each with a size of 256 MB, to easily reproduce this issue.

To hotplug CPU and memory, I used to use a PowerPC-specific tool. You
may need to find one that is available for ARM64.

Along with fixing the problem mentioned above, this feature also
significantly reduces the kdump service downtime. How?

1. Handle the kdump image update on CPU/Memory hotplug events in
     the kernel itself, without userspace intervention.

2. Only recreate/update the relevant kexec segment instead of reloading
      all segments.

Thanks,
Sourabh Jain

>> Going forward CPU hotplug or online/offline events are referred as
>> CPU/Memory add/remove events.
>>
>> Existing solution and its shortcoming:
>> ======================================
>> The current solution to address the above issue involves monitoring the
>> CPU/memory add/remove events in userspace using udev rules and whenever
>> there are changes in CPU and memory resources, the entire kdump image
>> is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
>> FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
>> CPU/Memory add/remove events, reloading the entire kdump image is
>> inefficient. More importantly, kdump remains inactive for a substantial
>> amount of time until the kdump reload completes.
>>
>> Proposed solution:
>> ==================
>> Instead of initiating a full kdump image reload from userspace on
>> CPU/Memory hotplug and online/offline events, the proposed solution aims
>> to update only the necessary kdump image component within the kernel
>> itself.
>>
>> Git tree for testing:
>> =====================
>> https://github.com/sourabhjains/linux/tree/kdump-in-kernel-crash-update-v18
>>
>> Above tree is rebased on top of powerpc/next branch.
>>
>> To realize this feature, the kdump udev rule must be updated. On RHEL,
>> add the following two lines at the top of the
>> "/usr/lib/udev/rules.d/98-kexec.rules" file.
>>
>> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>> SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>>
>> With the above change to the kdump udev rule, kdump reload is avoided
>> during CPU/Memory add/remove events if this feature is enabled in the
>> kernel.
>>
>> Note: only kexec_file_load syscall will work. For kexec_load minor changes
>> are required in kexec tool.
>>
>> Changelog:
>> ----------
>> v18: [No functional changes]
>>    - Update a comment in 2/6.
>>    - Describe the clean-up done on x86 in patch description 2/6.
>>    - Fix a minor typo in the patch description of 3/6.
>>
>> v17: [https://lore.kernel.org/all/20240226084118.16310-1-sourabhjain@linux.ibm.com/]
>>    - Rebase the patch series on top linux-next tree and below patch series
>>      https://lore.kernel.org/all/20240213113150.1148276-1-hbathini@linux.ibm.com/
>>    - Split 0003 patch from v16 into two patches
>>         1. Move get_crash_memory_ranges() along with other *_memory_ranges()
>>            functions to ranges.c and make them public.
>>         2. Make update_cpus_node function public and take this function
>>            out of file_load_64.c
>>    - Keep arch_crash_hotplug_support in crash.c instead of core_64.c [05/06]
>>    - Use CONFIG_CRASH_MAX_MEMORY_RANGES to find extra elfcorehdr size [06/06]
>>
>> v16: [https://lore.kernel.org/all/20240217081452.164571-1-sourabhjain@linux.ibm.com/]
>>    - Remove the unused #define `crash_hotplug_cpu_support`
>>      and `crash_hotplug_memory_support` in `arch/x86/include/asm/kexec.h`.
>>    - Document why two kexec flag bits are used in
>>      `arch_crash_hotplug_memory_support` (x86).
>>    - Use a switch case to handle different hotplug operations
>>      in `arch_crash_handle_hotplug_event` for PowerPC.
>>    - Fix a typo in 4/5.
>>
>> v15:
>>    - Remove the patch that adds a new kexec flag for FDT update.
>>    - Introduce a generic kexec flag bit to share hotplug support
>>      intent between the kexec tool and the kernel for the kexec_load
>>      syscall. (2/5)
>>    - Introduce an architecture-specific handler to process the kexec
>>      flag for crash hotplug support. (2/5)
>>    - Rename the @update_elfcorehdr member of the struct kimage to
>>      @hotplug_support. (2/5)
>>    - Use a common function to advertise hotplug support for both CPU
>>      and Memory. (2/5)
>>
>> v14:
>>    - Fix build warnings by including necessary header files
>>    - Rebase to v6.7-rc5
>>
>> v13:
>>    - Fix a build warning, take ranges.c out of CONFIG_KEXEC_FILE
>>    - Rebase to v6.7-rc4
>>
>> v12:
>>    - A patch to add new kexec flags to support this feature on kexec_load
>>      system call
>>    - Change in the way this feature is advertise to userspace for both
>>      kexec_load syscall
>>    - Rebase to v6.6-rc7
>>
>> v11:
>>    - Rebase to v6.4-rc6
>>    - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been
>>      removed. The config is now part of common configuration:
>>      https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/
>>
>> v10:
>>    - Drop the patch that adds fdt_index attribute to struct kimage_arch
>>      Find the fdt segment index when needed.
>>    - Added more details into commits messages.
>>    - Rebased onto 6.3.0-rc5
>>
>> v9:
>>    - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
>>      The patch is moved to generic patch series that introduces generic
>>      infrastructure for in kernel crash update.
>>    - Removed patch to pass the hotplug action type to the arch crash
>>      hotplug handler function. The generic patch series has introduced
>>      the hotplug action type in kimage struct.
>>    - Add detail commit message for better understanding.
>>
>> v8:
>>    - Restrict fdt_index initialization to machine_kexec_post_load
>>      it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour
>>
>>    - Updated the logic to find the number of offline core. [6/8]
>>
>>    - Changed the logic to find the elfcore program header to accommodate
>>      future memory ranges due memory hotplug events. [8/8]
>>
>> v7
>>    - added a new config to configure this feature
>>    - pass hotplug action type to arch specific handler
>>
>> v6
>>    - Added crash memory hotplug support
>>
>> v5:
>>    - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
>>    - Move fdt segment identification for kexec_load case to load path
>>      instead of crash hotplug handler
>>    - Keep new attribute defined under kimage_arch to track FDT segment
>>      under CONFIG_HOTPLUG_CPU config.
>>
>> v4:
>>    - Update the logic to find the additional space needed for hotadd CPUs
>>      post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash
>>      hotplug support for kexec_file_load" patch to know more about the
>>      change.
>>    - Fix a couple of typo.
>>    - Replace pr_err to pr_info_once to warn user about memory hotplug
>>      support.
>>    - In crash hotplug handle exit the for loop if FDT segment is found.
>>
>> v3
>>    - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
>>    - Rebase patche on top of
>>      https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
>>    - Fixed warning reported by checpatch script
>>
>> v2:
>>    - Use generic hotplug handler introduced by
>>      https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
>>      a significant change from v1.
>>
>> Cc: Akhil Raj <lf32.dev at gmail.com>
>> Cc: Andrew Morton <akpm at linux-foundation.org>
>> Cc: Aneesh Kumar K.V <aneesh.kumar at kernel.org>
>> Cc: Baoquan He <bhe at redhat.com>
>> Cc: Borislav Petkov (AMD) <bp at alien8.de>
>> Cc: Boris Ostrovsky <boris.ostrovsky at oracle.com>
>> Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
>> Cc: Dave Hansen <dave.hansen at linux.intel.com>
>> Cc: Dave Young <dyoung at redhat.com>
>> Cc: David Hildenbrand <david at redhat.com>
>> Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>> Cc: Laurent Dufour <laurent.dufour at fr.ibm.com>
>> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
>> Cc: Michael Ellerman <mpe at ellerman.id.au>
>> Cc: Mimi Zohar <zohar at linux.ibm.com>
>> Cc: Naveen N Rao <naveen at kernel.org>
>> Cc: Oscar Salvador <osalvador at suse.de>
>> Cc: Thomas Gleixner <tglx at linutronix.de>
>> Cc: Valentin Schneider <vschneid at redhat.com>
>> Cc: Vivek Goyal <vgoyal at redhat.com>
>> Cc: kexec at lists.infradead.org
>> Cc: x86 at kernel.org
>>
>> Sourabh Jain (6):
>>    crash: forward memory_notify arg to arch crash hotplug handler
>>    crash: add a new kexec flag for hotplug support
>>    powerpc/kexec: move *_memory_ranges functions to ranges.c
>>    PowerPC/kexec: make the update_cpus_node() function public
>>    powerpc/crash: add crash CPU hotplug support
>>    powerpc/crash: add crash memory hotplug support
>>
>>   arch/powerpc/Kconfig                    |   4 +
>>   arch/powerpc/include/asm/kexec.h        |  15 ++
>>   arch/powerpc/include/asm/kexec_ranges.h |  20 +-
>>   arch/powerpc/kexec/Makefile             |   4 +-
>>   arch/powerpc/kexec/core_64.c            |  91 +++++++
>>   arch/powerpc/kexec/crash.c              | 196 +++++++++++++++
>>   arch/powerpc/kexec/elf_64.c             |   3 +-
>>   arch/powerpc/kexec/file_load_64.c       | 314 +++---------------------
>>   arch/powerpc/kexec/ranges.c             | 312 ++++++++++++++++++++++-
>>   arch/x86/include/asm/kexec.h            |  13 +-
>>   arch/x86/kernel/crash.c                 |  32 ++-
>>   drivers/base/cpu.c                      |   2 +-
>>   drivers/base/memory.c                   |   2 +-
>>   include/linux/crash_core.h              |  15 +-
>>   include/linux/kexec.h                   |  11 +-
>>   include/uapi/linux/kexec.h              |   1 +
>>   kernel/crash_core.c                     |  29 +--
>>   kernel/kexec.c                          |   4 +-
>>   kernel/kexec_file.c                     |   5 +
>>   19 files changed, 714 insertions(+), 359 deletions(-)
>>




More information about the kexec mailing list