[PATCH] Document/kexec: Generalize crash hotplug description
Sourabh Jain
sourabhjain at linux.ibm.com
Fri Aug 9 04:03:10 PDT 2024
Hello Baoquan,
On 09/08/24 07:18, Baoquan He wrote:
> On 08/05/24 at 10:38am, Sourabh Jain wrote:
>> Commit 79365026f869 ("crash: add a new kexec flag for hotplug support")
>> generalizes the crash hotplug support to allow architectures to update
>> multiple kexec segments on CPU/Memory hotplug and not just elfcorehdr.
>> Therefore, update the relevant kernel documentation to reflect the same.
>>
>> No functional change.
>>
>> Cc: Petr Tesarik <petr at tesarici.cz>
>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>> Cc: kexec at lists.infradead.org
>> Cc: linux-kernel at vger.kernel.org
>> Cc: linuxppc-dev at lists.ozlabs.org
>> Cc: x86 at kernel.org
>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>> ---
>>
>> Discussion about the documentation update:
>> https://lore.kernel.org/all/68d0328d-531a-4a2b-ab26-c97fd8a12e8b@linux.ibm.com/
>>
>> ---
>> .../ABI/testing/sysfs-devices-memory | 6 ++--
>> .../ABI/testing/sysfs-devices-system-cpu | 6 ++--
>> .../admin-guide/mm/memory-hotplug.rst | 5 ++--
>> Documentation/core-api/cpu_hotplug.rst | 10 ++++---
>> kernel/crash_core.c | 29 ++++++++++++-------
>> 5 files changed, 33 insertions(+), 23 deletions(-)
> The overall looks good to me, except of concern from Petr. Thanks.
Thanks for the review. I will make the suggested changes in v2.
Additionally I will also generalize the error message
"kexec_trylock() failed, elfcorehdr may be inaccurate " from
functions crash_handle_hotplug_event() and crash_check_hotplug_support()
to "kexec_trylock() failed, kdump image may be inaccurate"
- Sourabh Jain
>
>> diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
>> index a95e0f17c35a..421acc8e2c6b 100644
>> --- a/Documentation/ABI/testing/sysfs-devices-memory
>> +++ b/Documentation/ABI/testing/sysfs-devices-memory
>> @@ -115,6 +115,6 @@ What: /sys/devices/system/memory/crash_hotplug
>> Date: Aug 2023
>> Contact: Linux kernel mailing list <linux-kernel at vger.kernel.org>
>> Description:
>> - (RO) indicates whether or not the kernel directly supports
>> - modifying the crash elfcorehdr for memory hot un/plug and/or
>> - on/offline changes.
>> + (RO) indicates whether or not the kernel update of kexec
>> + segments on memory hot un/plug and/or on/offline events,
>> + avoiding the need to reload kdump kernel.
>> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
>> index 325873385b71..f4ada1cd2f96 100644
>> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
>> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
>> @@ -703,9 +703,9 @@ What: /sys/devices/system/cpu/crash_hotplug
>> Date: Aug 2023
>> Contact: Linux kernel mailing list <linux-kernel at vger.kernel.org>
>> Description:
>> - (RO) indicates whether or not the kernel directly supports
>> - modifying the crash elfcorehdr for CPU hot un/plug and/or
>> - on/offline changes.
>> + (RO) indicates whether or not the kernel update of kexec
>> + segments on CPU hot un/plug and/or on/offline events,
>> + avoiding the need to reload kdump kernel.
>>
>> What: /sys/devices/system/cpu/enabled
>> Date: Nov 2022
>> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
>> index 098f14d83e99..cb2c080f400c 100644
>> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
>> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
>> @@ -294,8 +294,9 @@ The following files are currently defined:
>> ``crash_hotplug`` read-only: when changes to the system memory map
>> occur due to hot un/plug of memory, this file contains
>> '1' if the kernel updates the kdump capture kernel memory
>> - map itself (via elfcorehdr), or '0' if userspace must update
>> - the kdump capture kernel memory map.
>> + map itself (via elfcorehdr and other relevant kexec
>> + segments), or '0' if userspace must update the kdump
>> + capture kernel memory map.
>>
>> Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
>> configuration option.
>> diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
>> index dcb0e379e5e8..a21dbf261be7 100644
>> --- a/Documentation/core-api/cpu_hotplug.rst
>> +++ b/Documentation/core-api/cpu_hotplug.rst
>> @@ -737,8 +737,9 @@ can process the event further.
>>
>> When changes to the CPUs in the system occur, the sysfs file
>> /sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
>> -updates the kdump capture kernel list of CPUs itself (via elfcorehdr),
>> -or '0' if userspace must update the kdump capture kernel list of CPUs.
>> +updates the kdump capture kernel list of CPUs itself (via elfcorehdr and
>> +other relevant kexec segment), or '0' if userspace must update the kdump
>> +capture kernel list of CPUs.
>>
>> The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
>> option.
>> @@ -750,8 +751,9 @@ file can be used in a udev rule as follows:
>> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>>
>> For a CPU hot un/plug event, if the architecture supports kernel updates
>> -of the elfcorehdr (which contains the list of CPUs), then the rule skips
>> -the unload-then-reload of the kdump capture kernel.
>> +of the elfcorehdr (which contains the list of CPUs) and other relevant
>> +kexec segments, then the rule skips the unload-then-reload of the kdump
>> +capture kernel.
>>
>> Kernel Inline Documentations Reference
>> ======================================
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 63cf89393c6e..64dad01e260b 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -520,18 +520,25 @@ int crash_check_hotplug_support(void)
>> }
>>
>> /*
>> - * To accurately reflect hot un/plug changes of cpu and memory resources
>> - * (including onling and offlining of those resources), the elfcorehdr
>> - * (which is passed to the crash kernel via the elfcorehdr= parameter)
>> - * must be updated with the new list of CPUs and memories.
>> + * To accurately reflect hot un/plug changes of CPU and Memory resources
>> + * (including onling and offlining of those resources), the relevant
>> + * kexec segments must be updated with latest CPU and Memory resources.
>> *
>> - * In order to make changes to elfcorehdr, two conditions are needed:
>> - * First, the segment containing the elfcorehdr must be large enough
>> - * to permit a growing number of resources; the elfcorehdr memory size
>> - * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
>> - * Second, purgatory must explicitly exclude the elfcorehdr from the
>> - * list of segments it checks (since the elfcorehdr changes and thus
>> - * would require an update to purgatory itself to update the digest).
>> + * Architectures must ensure two things for all segments that need
>> + * updating during hotplug events:
>> + *
>> + * 1. Segments must be large enough to accommodate a growing number of
>> + * resources.
>> + * 2. Exclude the segments from SHA verification.
>> + *
>> + * For example, on most architectures, the elfcorehdr (which is passed
>> + * to the crash kernel via the elfcorehdr= parameter) must include the
>> + * new list of CPUs and memory. To make changes to the elfcorehdr, it
>> + * should be large enough to permit a growing number of CPU and Memory
>> + * resources. One can estimate the elfcorehdr memory size based on
>> + * NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES. The elfcorehdr is
>> + * excluded from SHA verification by default if the architecture
>> + * supports crash hotplug.
>> */
>> static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg)
>> {
>> --
>> 2.45.2
>>
More information about the kexec
mailing list