[PATCH v18 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

Jinjie Ruan ruanjinjie at huawei.com
Sun Aug 4 19:28:50 PDT 2024



On 2024/3/26 13:54, Sourabh Jain wrote:
> Commit 247262756121 ("crash: add generic infrastructure for crash
> hotplug support") added a generic infrastructure that allows
> architectures to selectively update the kdump image component during CPU
> or memory add/remove events within the kernel itself.
> 
> This patch series adds crash hotplug handler for PowerPC and enable
> support to update the kdump image on CPU/Memory add/remove events.
> 
> Among the 6 patches in this series, the first two patches make changes
> to the generic crash hotplug handler to assist PowerPC in adding support
> for this feature. The last four patches add support for this feature.
> 
> The following section outlines the problem addressed by this patch
> series, along with the current solution, its shortcomings, and the
> proposed resolution.
> 
> Problem:
> ========
> Due to CPU/Memory hotplug or online/offline events the elfcorehdr
> (which describes the CPUs and memory of the crashed kernel) and FDT
> (Flattened Device Tree) of kdump image becomes outdated. Consequently,
> attempting dump collection with an outdated elfcorehdr or FDT can lead
> to failed or inaccurate dump collection.

Hi, Sourabh, are there any specific methods to reproduce the scenarios
for this feature? I would like to port this feature to ARM64, but I
don't know how to reproduce the issue.

> 
> Going forward CPU hotplug or online/offline events are referred as
> CPU/Memory add/remove events.
> 
> Existing solution and its shortcoming:
> ======================================
> The current solution to address the above issue involves monitoring the
> CPU/memory add/remove events in userspace using udev rules and whenever
> there are changes in CPU and memory resources, the entire kdump image
> is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
> FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
> CPU/Memory add/remove events, reloading the entire kdump image is
> inefficient. More importantly, kdump remains inactive for a substantial
> amount of time until the kdump reload completes.
> 
> Proposed solution:
> ==================
> Instead of initiating a full kdump image reload from userspace on
> CPU/Memory hotplug and online/offline events, the proposed solution aims
> to update only the necessary kdump image component within the kernel
> itself.
> 
> Git tree for testing:
> =====================
> https://github.com/sourabhjains/linux/tree/kdump-in-kernel-crash-update-v18
> 
> Above tree is rebased on top of powerpc/next branch.
> 
> To realize this feature, the kdump udev rule must be updated. On RHEL,
> add the following two lines at the top of the
> "/usr/lib/udev/rules.d/98-kexec.rules" file.
> 
> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> 
> With the above change to the kdump udev rule, kdump reload is avoided
> during CPU/Memory add/remove events if this feature is enabled in the
> kernel.
> 
> Note: only kexec_file_load syscall will work. For kexec_load minor changes
> are required in kexec tool.
> 
> Changelog:
> ----------
> v18: [No functional changes]
>   - Update a comment in 2/6.
>   - Describe the clean-up done on x86 in patch description 2/6.
>   - Fix a minor typo in the patch description of 3/6.
> 
> v17: [https://lore.kernel.org/all/20240226084118.16310-1-sourabhjain@linux.ibm.com/]
>   - Rebase the patch series on top linux-next tree and below patch series
>     https://lore.kernel.org/all/20240213113150.1148276-1-hbathini@linux.ibm.com/
>   - Split 0003 patch from v16 into two patches
>        1. Move get_crash_memory_ranges() along with other *_memory_ranges()
>           functions to ranges.c and make them public.
>        2. Make update_cpus_node function public and take this function
>           out of file_load_64.c
>   - Keep arch_crash_hotplug_support in crash.c instead of core_64.c [05/06]
>   - Use CONFIG_CRASH_MAX_MEMORY_RANGES to find extra elfcorehdr size [06/06]
> 
> v16: [https://lore.kernel.org/all/20240217081452.164571-1-sourabhjain@linux.ibm.com/]
>   - Remove the unused #define `crash_hotplug_cpu_support`
>     and `crash_hotplug_memory_support` in `arch/x86/include/asm/kexec.h`.
>   - Document why two kexec flag bits are used in
>     `arch_crash_hotplug_memory_support` (x86).
>   - Use a switch case to handle different hotplug operations
>     in `arch_crash_handle_hotplug_event` for PowerPC.
>   - Fix a typo in 4/5.
> 
> v15:
>   - Remove the patch that adds a new kexec flag for FDT update.
>   - Introduce a generic kexec flag bit to share hotplug support
>     intent between the kexec tool and the kernel for the kexec_load
>     syscall. (2/5)
>   - Introduce an architecture-specific handler to process the kexec
>     flag for crash hotplug support. (2/5)
>   - Rename the @update_elfcorehdr member of the struct kimage to
>     @hotplug_support. (2/5)
>   - Use a common function to advertise hotplug support for both CPU
>     and Memory. (2/5)
> 
> v14:
>   - Fix build warnings by including necessary header files
>   - Rebase to v6.7-rc5
> 
> v13:
>   - Fix a build warning, take ranges.c out of CONFIG_KEXEC_FILE
>   - Rebase to v6.7-rc4
> 
> v12:
>   - A patch to add new kexec flags to support this feature on kexec_load
>     system call
>   - Change in the way this feature is advertise to userspace for both
>     kexec_load syscall
>   - Rebase to v6.6-rc7
> 
> v11:
>   - Rebase to v6.4-rc6
>   - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been
>     removed. The config is now part of common configuration:
>     https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/
> 
> v10:
>   - Drop the patch that adds fdt_index attribute to struct kimage_arch
>     Find the fdt segment index when needed.
>   - Added more details into commits messages.
>   - Rebased onto 6.3.0-rc5
> 
> v9:
>   - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
>     The patch is moved to generic patch series that introduces generic
>     infrastructure for in kernel crash update.
>   - Removed patch to pass the hotplug action type to the arch crash
>     hotplug handler function. The generic patch series has introduced
>     the hotplug action type in kimage struct.
>   - Add detail commit message for better understanding.
> 
> v8:
>   - Restrict fdt_index initialization to machine_kexec_post_load
>     it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour
> 
>   - Updated the logic to find the number of offline core. [6/8]
> 
>   - Changed the logic to find the elfcore program header to accommodate
>     future memory ranges due memory hotplug events. [8/8]
> 
> v7
>   - added a new config to configure this feature
>   - pass hotplug action type to arch specific handler
> 
> v6
>   - Added crash memory hotplug support
> 
> v5:
>   - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
>   - Move fdt segment identification for kexec_load case to load path
>     instead of crash hotplug handler
>   - Keep new attribute defined under kimage_arch to track FDT segment
>     under CONFIG_HOTPLUG_CPU config.
> 
> v4:
>   - Update the logic to find the additional space needed for hotadd CPUs
>     post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash
>     hotplug support for kexec_file_load" patch to know more about the
>     change.
>   - Fix a couple of typo.
>   - Replace pr_err to pr_info_once to warn user about memory hotplug
>     support.
>   - In crash hotplug handle exit the for loop if FDT segment is found.
> 
> v3
>   - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
>   - Rebase patche on top of
>     https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
>   - Fixed warning reported by checpatch script
> 
> v2:
>   - Use generic hotplug handler introduced by
>     https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
>     a significant change from v1.
> 
> Cc: Akhil Raj <lf32.dev at gmail.com>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Aneesh Kumar K.V <aneesh.kumar at kernel.org>
> Cc: Baoquan He <bhe at redhat.com>
> Cc: Borislav Petkov (AMD) <bp at alien8.de>
> Cc: Boris Ostrovsky <boris.ostrovsky at oracle.com>
> Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
> Cc: Dave Hansen <dave.hansen at linux.intel.com>
> Cc: Dave Young <dyoung at redhat.com>
> Cc: David Hildenbrand <david at redhat.com>
> Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
> Cc: Hari Bathini <hbathini at linux.ibm.com>
> Cc: Laurent Dufour <laurent.dufour at fr.ibm.com>
> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
> Cc: Michael Ellerman <mpe at ellerman.id.au>
> Cc: Mimi Zohar <zohar at linux.ibm.com>
> Cc: Naveen N Rao <naveen at kernel.org>
> Cc: Oscar Salvador <osalvador at suse.de>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Valentin Schneider <vschneid at redhat.com>
> Cc: Vivek Goyal <vgoyal at redhat.com>
> Cc: kexec at lists.infradead.org
> Cc: x86 at kernel.org
> 
> Sourabh Jain (6):
>   crash: forward memory_notify arg to arch crash hotplug handler
>   crash: add a new kexec flag for hotplug support
>   powerpc/kexec: move *_memory_ranges functions to ranges.c
>   PowerPC/kexec: make the update_cpus_node() function public
>   powerpc/crash: add crash CPU hotplug support
>   powerpc/crash: add crash memory hotplug support
> 
>  arch/powerpc/Kconfig                    |   4 +
>  arch/powerpc/include/asm/kexec.h        |  15 ++
>  arch/powerpc/include/asm/kexec_ranges.h |  20 +-
>  arch/powerpc/kexec/Makefile             |   4 +-
>  arch/powerpc/kexec/core_64.c            |  91 +++++++
>  arch/powerpc/kexec/crash.c              | 196 +++++++++++++++
>  arch/powerpc/kexec/elf_64.c             |   3 +-
>  arch/powerpc/kexec/file_load_64.c       | 314 +++---------------------
>  arch/powerpc/kexec/ranges.c             | 312 ++++++++++++++++++++++-
>  arch/x86/include/asm/kexec.h            |  13 +-
>  arch/x86/kernel/crash.c                 |  32 ++-
>  drivers/base/cpu.c                      |   2 +-
>  drivers/base/memory.c                   |   2 +-
>  include/linux/crash_core.h              |  15 +-
>  include/linux/kexec.h                   |  11 +-
>  include/uapi/linux/kexec.h              |   1 +
>  kernel/crash_core.c                     |  29 +--
>  kernel/kexec.c                          |   4 +-
>  kernel/kexec_file.c                     |   5 +
>  19 files changed, 714 insertions(+), 359 deletions(-)
> 



More information about the kexec mailing list