[PATCH v11 0/4] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel

Sourabh Jain sourabhjain at linux.ibm.com
Sun Jun 18 19:49:30 PDT 2023


The Problem:
============
Post CPU/Memory hot plug/unplug and online/offline events occur, the
kdump kernel often retains outdated system information. This presents
a significant challenge when attempting to perform a dump collection
using an outdated or stale kdump kernel. In such situations, there
are two potential outcomes that pose risks: either the dump collection
fails to capture the required data entirely, leading to a failed dump,
or the collected dump data is inaccurate, thereby compromising its
reliability for analysis and troubleshooting purposes

Existing solution:
==================
The existing solution to keep the kdump kernel up-to-date involves
monitoring CPU/Memory hotplug/online/offline events via a udev rule.
This approach triggers a full kdump kernel reload for each hotplug event,
ensuring that the kdump kernel is always synchronized with the latest
system resource changes.

Shortcomings of existing solution:
==================================
- Leaves a window where kernel crash might not lead to a successful dump
  collection.
- Reloading all kexec segments for each hotplug is inefficient.
- udev rules are prone to races if hotplug events are frequent.

Further information regarding the problems associated with a current
solution can be found here.
 - https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com/
 - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html

Proposed Solution:
==================
To address the limitations of the current approach, a proposed solution
focuses on implementing a more targeted update strategy. Instead of
performing a full reload of all kexec segments for every CPU/Memory hot
plug/unplug and online/offline events, the proposed solution aims to update
only the relevant kexec segment. After loading the kexec segments into the
reserved area, a newly introduced hotplug handler will be responsible for
updating the specific kexec segment based on the type of hotplug event.
This selective update approach enhances overall efficiency by minimizing
unnecessary overhead and significantly reduces the chances of a kernel
crash leading to a failed or inaccurate dump collection.

Series Dependencies:
====================
The implementation of the crash hotplug handler on PowerPC is included in
this patch series. The introduction of the generic crash hotplug handler
is done through the patch series available at
https://lore.kernel.org/all/20230612210712.683175-1-eric.devolder@oracle.com/

Git tree for testing:
=====================
The following Git tree incorporates this patch series applied on top of
the dependent patch series.
https://github.com/sourabhjains/linux/tree/e23-s11-with-kexec-config

In order to enable this feature, it is necessary to disable the udev rule
responsible for reloading the kdump service. To do this, you can make the
following additions to the file "/usr/lib/udev/rules.d/98-kexec.rules" on RHEL:

Add the following two lines at top:

   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The changes mentioned above ensure that the kdump reload process is skipped
for CPU/Memory hot plug/unplug events when the path
"/sys/devices/system/[cpu|memory]/crash_hotplug" exists.

Note: only kexec_file_load syscall will work. For kexec_load minor changes are
required in kexec tool.

---
Changelog:

v11:
  - Rebase to v6.4-rc6
  - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been removed.
    The config is now part of common configuration:
    https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/

v10:
  - Drop the patch that adds fdt_index attribute to struct kimage_arch
    Find the fdt segment index when needed.
  - Added more details into commits messages.
  - Rebased onto 6.3.0-rc5

v9:
  - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
    The patch is moved to generic patch series that introduces generic
    infrastructure for in kernel crash update.
  - Removed patch to pass the hotplug action type to the arch crash
    hotplug handler function. The generic patch series has introduced
    the hotplug action type in kimage struct.
  - Add detail commit message for better understanding.

v8:
  - Restrict fdt_index initialization to machine_kexec_post_load
    it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour

  - Updated the logic to find the number of offline core. [6/8]

  - Changed the logic to find the elfcore program header to accommodate
    future memory ranges due memory hotplug events. [8/8]

v7
  - added a new config to configure this feature
  - pass hotplug action type to arch specific handler

v6
  - Added crash memory hotplug support

v5:
  - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
  - Move fdt segment identification for kexec_load case to load path
    instead of crash hotplug handler
  - Keep new attribute defined under kimage_arch to track FDT segment
    under CONFIG_HOTPLUG_CPU config.

v4:
  - Update the logic to find the additional space needed for hotadd CPUs post
    kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug
    support for kexec_file_load" patch to know more about the change.
  - Fix a couple of typo.
  - Replace pr_err to pr_info_once to warn user about memory hotplug
    support.
  - In crash hotplug handle exit the for loop if FDT segment is found.

v3
  - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
  - Rebase patche on top of
    https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@oracle.com/
  - Fixed warning reported by checpatch script

v2:
  - Use generic hotplug handler introduced by
    https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@oracle.com/
    a significant change from v1.

Sourabh Jain (4):
  powerpc/kexec: turn some static helper functions public
  powerpc/crash: add crash CPU hotplug support
  crash: forward memory_notify args to arch crash hotplug handler
  powerpc/crash: add crash memory hotplug support

 arch/powerpc/Kconfig                    |   3 +
 arch/powerpc/include/asm/kexec.h        |  22 ++
 arch/powerpc/include/asm/kexec_ranges.h |   1 +
 arch/powerpc/kexec/core_64.c            | 301 ++++++++++++++++++++++++
 arch/powerpc/kexec/elf_64.c             |  12 +-
 arch/powerpc/kexec/file_load_64.c       | 212 ++++-------------
 arch/powerpc/kexec/ranges.c             |  85 +++++++
 arch/x86/include/asm/kexec.h            |   2 +-
 arch/x86/kernel/crash.c                 |   5 +-
 include/linux/kexec.h                   |   2 +-
 kernel/crash_core.c                     |  14 +-
 11 files changed, 483 insertions(+), 176 deletions(-)

-- 
2.40.1




More information about the kexec mailing list