From pasha.tatashin at soleen.com Sat Nov 1 06:49:46 2025 From: pasha.tatashin at soleen.com (Pasha Tatashin) Date: Sat, 1 Nov 2025 09:49:46 -0400 Subject: [PATCH 1/1] kexec: Use %pe format specifier for error pointer printing In-Reply-To: <20251016200320.4179702-1-yanjun.zhu@linux.dev> References: <20251016200320.4179702-1-yanjun.zhu@linux.dev> Message-ID: On Thu, Oct 16, 2025 at 4:03?PM Zhu Yanjun wrote: > > Make pr_xxx() call to use the %pe format specifier instead of %d. > The %pe specifier prints a symbolic error string (e.g., -ENOMEM, > -EINVAL) when given an error pointer created with ERR_PTR(err). > > This change enhances the clarity and diagnostic value of the error > message by showing a descriptive error name rather than a numeric > error code. > > Signed-off-by: Zhu Yanjun > CC: graf at amazon.com > CC: rppt at kernel.org > CC: changyuanl at google.com > CC: akpm at linux-foundation.org > CC: bhe at redhat.com > > --- > kernel/kexec_handover.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c > index 76f0940fb485..77af377022b0 100644 > --- a/kernel/kexec_handover.c > +++ b/kernel/kexec_handover.c > @@ -1095,7 +1095,7 @@ static int kho_abort(void) > err = notifier_to_errno(err); > > if (err) > - pr_err("Failed to abort KHO finalization: %d\n", err); > + pr_err("Failed to abort KHO finalization: %pe\n", ERR_PTR(err)); > > return err; > } > @@ -1142,7 +1142,7 @@ static int kho_finalize(void) > > abort: > if (err) { > - pr_err("Failed to convert KHO state tree: %d\n", err); > + pr_err("Failed to convert KHO state tree: %pe\n", ERR_PTR(err)); The problem here (and in some other places below) is err a not an -errno, but fdt error: see: scripts/dtc/libfdt/libfdt.h %pe ERR_PTR(err) will output garbage, and make the debugging even harder. From sourabhjain at linux.ibm.com Sat Nov 1 12:37:41 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Sun, 2 Nov 2025 01:07:41 +0530 Subject: [PATCH] crash: fix crashkernel resource shrink Message-ID: <20251101193741.289252-1-sourabhjain@linux.ibm.com> When crashkernel is configured with a high reservation, shrinking its value below the low crashkernel reservation causes two issues: 1. Invalid crashkernel resource objects 2. Kernel crash if crashkernel shrinking is done twice For example, with crashkernel=200M,high, the kernel reserves 200MB of high memory and some default low memory (say 256MB). The reservation appears as: cat /proc/iomem | grep -i crash af000000-beffffff : Crash kernel 433000000-43f7fffff : Crash kernel If crashkernel is then shrunk to 50MB (echo 52428800 > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: af000000-beffffff : Crash kernel Instead, it should show 50MB: af000000-b21fffff : Crash kernel Further shrinking crashkernel to 40MB causes a kernel crash with the following trace (x86): BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Call Trace: ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __release_resource+0xd/0xb0 release_resource+0x26/0x40 __crash_shrink_memory+0xe5/0x110 crash_shrink_memory+0x12a/0x190 kexec_crash_size_store+0x41/0x80 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x294/0x460 ksys_write+0x6d/0xf0 This happens because __crash_shrink_memory()/kernel/crash_core.c incorrectly updates the crashk_res resource object even when crashk_low_res should be updated. Fix this by ensuring the correct crashkernel resource object is updated when shrinking crashkernel memory. Fixes: 16c6006af4d4 ("kexec: enable kexec_crash_size to support two crash kernel regions") Cc: Andrew Morton Cc: Baoquan He Cc: Zhen Lei Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- kernel/crash_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 3b1c43382eec..99dac1aa972a 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -373,7 +373,7 @@ static int __crash_shrink_memory(struct resource *old_res, old_res->start = 0; old_res->end = 0; } else { - crashk_res.end = ram_res->start - 1; + old_res->end = ram_res->start - 1; } crash_free_reserved_phys_range(ram_res->start, ram_res->end); -- 2.51.0 From rientjes at google.com Sat Nov 1 16:35:07 2025 From: rientjes at google.com (David Rientjes) Date: Sat, 1 Nov 2025 16:35:07 -0700 (PDT) Subject: [Hypervisor Live Update] Notes from October 20, 2025 Message-ID: <734e26d2-ac5f-47be-331c-40e9b535ce55@google.com> Hi everybody, Here are the notes from the last Hypervisor Live Update call that happened on Monday, October 20. Thanks to everybody who was involved! These notes are intended to bring people up to speed who could not attend the call as well as keep the conversation going in between meetings. ----->o----- I thought this instance of the meeting would be short and I turned out to be very wrong :) We touched on the discussion from the previous instance regarding the fd dependency checking and this happening at the time of preserve rather than prepare, Pasha noted that the discussion continued upstream afterwards on the mailing list. The biggest change would be that the order is going to be enforced by the user. The preserve function itself is the heavy lifting now; the freeze and prepare are more for sanity checking. David Matlack asked how the global states wuld work since that's outside the fd. Pasha said the subsystem will be there but there will be another mechanism that follows the lifecycle of fds of a specific type; example is if a session has an fd of a specific type then it will follow the lifecycle of the aggregate. This will be supported in v5. ----->o----- Pasha updated that he had sent the KHO patches that provide the groundwork for LUO. Last week he also sent a KHO memory corruption fix. Once those patches are merged, he will send LUO v5. He was targeting sending the next series of changes before the next biweekly sync. ----->o----- Vipin Sharma sent out RFC patches for VFIO and was looking for feedback from the group in the next instance of the meeting. Jason was providing feedback on the upstream mailing list already. ----->o----- We shifted to discussing the main topic of the day which was iommu persistence from Samiullah. His slides are available on the shared drive. There was general alignment with what should be included in the next series upstream. His demonstrator so far included iommufd, iommu core, and iommu driver patches but was just preserving root tables. He also proposed hot swap. There was lots of discussion upstream around selection of HWPT to be preserved, preserved HWPT and iommu domain lifecycle, fd dependencies, and LUO finish. Pasha noted that LUO finish can now fail which Jason asked about. Pasha said if the fd hasn't replaced the hardware page table then finish would have to fail. Sami noted that the HWPTs are also restored and associated with the preserved iommu domains and this would be done when the fd is retrieved. We can't restore the domain during the probe but there is no mechanism to have the HWPTs to be created during the boot time. Jason said during probe time you put the domains back with placeholders so the iommu core has some understanding what the translation is. ----->o----- During the discussion for hotswap, Sami noted that once all the preserved devices have their iommu domains hot swapped, we can destroy the restored iommu domains that are not being used. Jason said that once the iommu domains are rehydrated back into an fd that they should have the normal lifecycle of a hardware page table in an fd. So they will be destroyed when the hardware page table is destroyed when the fd closes it or the VMM asks it to be destroyed. Jason noted that the VMM needs the id so that it can be destroyed. Jason suggested restoring the hardware page table pointers inside the devices that represent the currently attached hardware page table and this is done when you bring back the iommufd. We should likely retain a list for each hardware page table the list of which VFIO device objects are linked to it and this all needs to be brought back. Or an alternative may be to serialize the devices. IOMMU needs the VFIO devices and this needs careful orchestration. Pasha suggested that since we have the session and sessions have specific orders, the things without any dependencies that were preserved first and things with dependencies were preserved last. The kernel could call restore on everything from lowest to highest. Jason said there needs to be a two step process: the struct file needs to be brought back before you fill it. VFIO needs the iommufd to be filled before it can auto bind before it can complete its restoration. Sami suggested if we don't restore the HWPT until we have all the information, even if it closes it goes back to the state that it was in and we would consider the iommufd not fully restored until it is. Jason suggested that would require adding an iommufd ioctl to restore individual sub objects: restoring a HWPT that was with this tag and give back the id; the restore would only be possible if the VFIO devices are already present inside the iommufd. ----->o----- When discussing LUO finish, Pasha suggested we need a way to discard a session if it hasn't been reclaimed or there are exceptions. If the VM never is restored then we will have lingering session that need to be somehow discarded. Jason suggested all objects are brought back to userspace before you can encounter an error. If there are problems up to that point, then the cleanest way to address this is with another kexec. Jason stressed the need for another kexec as a big hammer to be able to do recovery and cleanup. For example, if there are 10 VMs and one did not restore, do another live update to clean up the lingering VM. ----->o----- Next meeting will be on Monday, November 3 at 8am PST (UTC-8), everybody is welcome: https://meet.google.com/rjn-dmzu-hgq NOTE!!! Daylight Savings Time has ended in the United States, so please check your local time carefully: Time zones PST (UTC-8) 8:00am MST (UTC-7) 9:00am CST (UTC-6) 10:00am EST (UTC-5) 11:00am Rio de Janeiro (UTC-3) 1:00pm London (UTC) 4:00pm Berlin (UTC+1) 5:00pm Moscow (UTC+3) 7:00pm Dubai (UTC+4) 8:00pm Mumbai (UTC+5:30) 9:30pm Singapore (UTC+8) 12:00am Tuesday Beijing (UTC+8) 12:00am Tuesday Tokyo (UTC+9) 1:00am Tuesday Sydney (UTC+11) 3:00am Tuesday Auckland (UTC+13) 5:00am Tuesday Topics for the next meeting: - update on the status of stateless KHO RFC patches that should simplify LUO support - update on LUO v5 and patch series sent upstream after KHO changes and fixes are staged - VFIO RFC patch feedback based on the series sent to the mailing list a couple weeks ago - follow up on the status of iommu persistence and any addtional discussion from last time - update on memfd preservation, vmalloc support, and 1GB limitation - discuss deferred struct page initialization and deferring when KHO is enabled - discuss guest_memfd preservation use cases for Confidential Computing and any current work happening on it, including overlap with memfd preservation being worked on by Pratyush + discuss any use cases for Confidential Computing where folios may need to be split after being marked as preserved during brown out - later: testing methodology to allow downstream consumers to qualify that live update works from one version to another - later: reducing blackout window during live update Please let me know if you'd like to propose additional topics for discussion, thank you! From georges.aureau at hpe.com Sun Nov 2 02:48:42 2025 From: georges.aureau at hpe.com (Aureau, Georges (Kernel Tools ERT)) Date: Sun, 2 Nov 2025 10:48:42 +0000 Subject: [PATCH][makedumpfile -R] Promptly return error on truncated regular file. In-Reply-To: References: Message-ID: Hello, Forget about this PATCH, it is missing some option as to control the behavior. What I'm really after is to promptly detect truncated flatten files without the 10-minute timeout. This would be assuming stdin is not a file in the process of being created (where eof is a moving target). Maybe something like "makedumpfile -R --some-option", where some-option would cause returning an immediate error in premature EOF: - if (TIMEOUT_STDIN < (tm - last_time)) { + if (some_option || TIMEOUT_STDIN < (tm - last_time)) { Thanks, Georges I |-----Original Message----- |From: Aureau, Georges (Kernel Tools ERT) |Sent: Friday, October 31, 2025 9:41 PM |To: kexec at lists.infradead.org |Cc: yamazaki-msmt at nec.com; HAGIO KAZUHITO(?????) |Subject: [PATCH][makedumpfile -R] Promptly return error on truncated regular |file. | |[PATCH][makedumpfile -R] Promptly return error on truncated regular file. | |When reaching the end-of-file on a truncated input regular file, |makedumpfile -R is looping for 10 minutes before producing an error. |This is confusing for users. When stdin is a regular file, an improved |behavior is to promptly return an EOF error. | |Signed-off-by: Georges Aureau |-- |diff --git a/makedumpfile.c b/makedumpfile.c |index 12fb0d8..295b3cc 100644 |--- a/makedumpfile.c |+++ b/makedumpfile.c |@@ -5135,6 +5135,21 @@ write_cache_zero(struct cache_data *cd, size_t size) | return write_cache_bufsz(cd); | } | |+int |+is_stdin_regular_file(void) |+{ |+ struct stat st; |+ static int regular_file = -1; |+ if (regular_file == -1) { |+ if (fstat(STDIN_FILENO, &st) == -1) { |+ regular_file = FALSE; |+ } else { |+ regular_file = S_ISREG(st.st_mode) ? TRUE : FALSE; |+ } |+ } |+ return regular_file; |+} |+ | int | read_buf_from_stdin(void *buf, int buf_size) | { |@@ -5154,11 +5169,12 @@ read_buf_from_stdin(void *buf, int buf_size) | | } else if (0 == tmp_read_size) { | /* |- * If it cannot get any data from a standard input |+ * If we reach end-of-file on regular file, or |+ * if we cannot get any data from a standard input | * for a long time, break this loop. | */ | tm = time(NULL); |- if (TIMEOUT_STDIN < (tm - last_time)) { |+ if (is_stdin_regular_file() || TIMEOUT_STDIN < (tm - last_time)) { | ERRMSG("Can't get any data from STDIN.\n"); | return FALSE; | } | | From bhe at redhat.com Sun Nov 2 18:54:50 2025 From: bhe at redhat.com (Baoquan He) Date: Mon, 3 Nov 2025 10:54:50 +0800 Subject: [PATCH] crash: fix crashkernel resource shrink In-Reply-To: <20251101193741.289252-1-sourabhjain@linux.ibm.com> References: <20251101193741.289252-1-sourabhjain@linux.ibm.com> Message-ID: On 11/02/25 at 01:07am, Sourabh Jain wrote: > When crashkernel is configured with a high reservation, shrinking its > value below the low crashkernel reservation causes two issues: > > 1. Invalid crashkernel resource objects > 2. Kernel crash if crashkernel shrinking is done twice > > For example, with crashkernel=200M,high, the kernel reserves 200MB of > high memory and some default low memory (say 256MB). The reservation > appears as: > > cat /proc/iomem | grep -i crash > af000000-beffffff : Crash kernel > 433000000-43f7fffff : Crash kernel > > If crashkernel is then shrunk to 50MB (echo 52428800 > > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: > af000000-beffffff : Crash kernel > > Instead, it should show 50MB: > af000000-b21fffff : Crash kernel > > Further shrinking crashkernel to 40MB causes a kernel crash with the > following trace (x86): > > BUG: kernel NULL pointer dereference, address: 0000000000000038 > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP NOPTI > > Call Trace: > ? __die_body.cold+0x19/0x27 > ? page_fault_oops+0x15a/0x2f0 > ? search_module_extables+0x19/0x60 > ? search_bpf_extables+0x5f/0x80 > ? exc_page_fault+0x7e/0x180 > ? asm_exc_page_fault+0x26/0x30 > ? __release_resource+0xd/0xb0 > release_resource+0x26/0x40 > __crash_shrink_memory+0xe5/0x110 > crash_shrink_memory+0x12a/0x190 > kexec_crash_size_store+0x41/0x80 > kernfs_fop_write_iter+0x141/0x1f0 > vfs_write+0x294/0x460 > ksys_write+0x6d/0xf0 > > > This happens because __crash_shrink_memory()/kernel/crash_core.c > incorrectly updates the crashk_res resource object even when > crashk_low_res should be updated. > > Fix this by ensuring the correct crashkernel resource object is updated > when shrinking crashkernel memory. > > Fixes: 16c6006af4d4 ("kexec: enable kexec_crash_size to support two crash kernel regions") > Cc: Andrew Morton > Cc: Baoquan He > Cc: Zhen Lei > Cc: kexec at lists.infradead.org > Signed-off-by: Sourabh Jain > --- > kernel/crash_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index 3b1c43382eec..99dac1aa972a 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -373,7 +373,7 @@ static int __crash_shrink_memory(struct resource *old_res, > old_res->start = 0; > old_res->end = 0; > } else { > - crashk_res.end = ram_res->start - 1; > + old_res->end = ram_res->start - 1; It's a code bug, nice catch, thanks. Acked-by: Baoquan He From sourabhjain at linux.ibm.com Sun Nov 2 19:58:57 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Mon, 3 Nov 2025 09:28:57 +0530 Subject: [PATCH 0/2] Export kdump crashkernel CMA ranges Message-ID: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> /sys/kernel/kexec_crash_cma_ranges to export all CMA regions reserved for the crashkernel to user-space. This enables user-space tools configuring kdump to determine the amount of memory reserved for the crashkernel. When CMA is used for crashkernel allocation, tools can use this information to warn users that attempting to capture user pages while CMA reservation is active may lead to unreliable or incomplete dump capture. While adding documentation for the new sysfs interface, I realized that there was no ABI document for the existing kexec and kdump sysfs interfaces, so I added one. The first patch adds the ABI documentation for the existing kexec and kdump sysfs interfaces, and the second patch adds the /sys/kernel/kexec_crash_cma_ranges sysfs interface along with its corresponding ABI documentation. *Seeking opinions* There are already four kexec/kdump sysfs entries under /sys/kernel/, and this patch series adds one more. Should we consider moving them to a separate directory, such as /sys/kernel/kexec, to avoid polluting /sys/kernel/? For backward compatibility, we can create symlinks at the old locations for sometime and remove them in the future. Cc: Andrew Morton Cc: Baoquan he Cc: Jiri Bohac Cc: Shivang Upadhyay Cc: linuxppc-dev at lists.ozlabs.org Cc: kexec at lists.infradead.org Sourabh Jain (2): Documentation/ABI: add kexec and kdump sysfs interface crash: export crashkernel CMA reservation to userspace .../ABI/testing/sysfs-kernel-kexec-kdump | 53 +++++++++++++++++++ kernel/ksysfs.c | 17 ++++++ 2 files changed, 70 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump -- 2.51.0 From sourabhjain at linux.ibm.com Sun Nov 2 19:58:58 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Mon, 3 Nov 2025 09:28:58 +0530 Subject: [PATCH 1/2] Documentation/ABI: add kexec and kdump sysfs interface In-Reply-To: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> References: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> Message-ID: <20251103035859.1267318-2-sourabhjain@linux.ibm.com> Add an ABI document for following kexec and kdump sysfs interface: - /sys/kernel/kexec_loaded - /sys/kernel/kexec_crash_loaded - /sys/kernel/kexec_crash_size - /sys/kernel/crash_elfcorehdr_size Cc: Andrew Morton Cc: Baoquan he Cc: Jiri Bohac Cc: Shivang Upadhyay Cc: linuxppc-dev at lists.ozlabs.org Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- .../ABI/testing/sysfs-kernel-kexec-kdump | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump new file mode 100644 index 000000000000..96b24565b68e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -0,0 +1,43 @@ +What: /sys/kernel/kexec_loaded +Date: Jun 2006 +Contact: kexec at lists.infradead.org +Description: read only + Indicates whether a new kernel image has been loaded + into memory using the kexec system call. It shows 1 if + a kexec image is present and ready to boot, or 0 if none + is loaded. +User: kexec tools, kdump service + +What: /sys/kernel/kexec_crash_loaded +Date: Jun 2006 +Contact: kexec at lists.infradead.org +Description: read only + Indicates whether a crash (kdump) kernel is currently + loaded into memory. It shows 1 if a crash kernel has been + successfully loaded for panic handling, or 0 if no crash + kernel is present. +User: Kexec tools, Kdump service + +What: /sys/kernel/kexec_crash_size +Date: Dec 2009 +Contact: kexec at lists.infradead.org +Description: read/write + Shows the amount of memory reserved for loading the crash + (kdump) kernel. It reports the size, in bytes, of the + crash kernel area defined by the crashkernel= parameter. + This interface also allows reducing the crashkernel + reservation by writing a smaller value, and the reclaimed + space is added back to the system RAM. +User: Kdump service + +What: /sys/kernel/crash_elfcorehdr_size +Date: Aug 2023 +Contact: kexec at lists.infradead.org +Description: read only + Indicates the preferred size of the memory buffer for the + ELF core header used by the crash (kdump) kernel. It defines + how much space is needed to hold metadata about the crashed + system, including CPU and memory information. This information + is used by the user space utility kexec to support updating the + in-kernel kdump image during hotplug operations. +User: Kexec tools -- 2.51.0 From sourabhjain at linux.ibm.com Sun Nov 2 19:58:59 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Mon, 3 Nov 2025 09:28:59 +0530 Subject: [PATCH 2/2] crash: export crashkernel CMA reservation to userspace In-Reply-To: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> References: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> Message-ID: <20251103035859.1267318-3-sourabhjain@linux.ibm.com> Add a sysfs entry /sys/kernel/kexec_crash_cma_ranges to expose all CMA crashkernel ranges. This allows userspace tools configuring kdump to determine how much memory is reserved for crashkernel. If CMA is used, tools can warn users when attempting to capture user pages with CMA reservation. The new sysfs hold the CMA ranges in below format: cat /sys/kernel/kexec_crash_cma_ranges 100000000-10c7fffff Cc: Andrew Morton Cc: Baoquan he Cc: Jiri Bohac Cc: Shivang Upadhyay Cc: linuxppc-dev at lists.ozlabs.org Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- .../ABI/testing/sysfs-kernel-kexec-kdump | 10 ++++++++++ kernel/ksysfs.c | 17 +++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump index 96b24565b68e..f6089e38de5f 100644 --- a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -41,3 +41,13 @@ Description: read only is used by the user space utility kexec to support updating the in-kernel kdump image during hotplug operations. User: Kexec tools + +What: /sys/kernel/kexec_crash_cma_ranges +Date: Nov 2025 +Contact: kexec at lists.infradead.org +Description: read only + Provides information about the memory ranges reserved from + the Contiguous Memory Allocator (CMA) area that are allocated + to the crash (kdump) kernel. It lists the start and end physical + addresses of CMA regions assigned for crashkernel use. +User: kdump service diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index eefb67d9883c..3855937aa923 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -135,6 +135,22 @@ static ssize_t kexec_crash_loaded_show(struct kobject *kobj, } KERNEL_ATTR_RO(kexec_crash_loaded); +static ssize_t kexec_crash_cma_ranges_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + + ssize_t len = 0; + int i; + + for (i = 0; i < crashk_cma_cnt; ++i) { + len += sysfs_emit_at(buf, len, "%08llx-%08llx\n", + crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end); + } + return len; +} +KERNEL_ATTR_RO(kexec_crash_cma_ranges); + static ssize_t kexec_crash_size_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -260,6 +276,7 @@ static struct attribute * kernel_attrs[] = { #ifdef CONFIG_CRASH_DUMP &kexec_crash_loaded_attr.attr, &kexec_crash_size_attr.attr, + &kexec_crash_cma_ranges_attr.attr, #endif #endif #ifdef CONFIG_VMCORE_INFO -- 2.51.0 From sourabhjain at linux.ibm.com Sun Nov 2 20:08:27 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Mon, 3 Nov 2025 09:38:27 +0530 Subject: [PATCH] crash: fix crashkernel resource shrink In-Reply-To: References: <20251101193741.289252-1-sourabhjain@linux.ibm.com> Message-ID: <01045c4c-a37a-4a31-8787-6483c7b78dad@linux.ibm.com> On 03/11/25 08:24, Baoquan He wrote: > On 11/02/25 at 01:07am, Sourabh Jain wrote: >> When crashkernel is configured with a high reservation, shrinking its >> value below the low crashkernel reservation causes two issues: >> >> 1. Invalid crashkernel resource objects >> 2. Kernel crash if crashkernel shrinking is done twice >> >> For example, with crashkernel=200M,high, the kernel reserves 200MB of >> high memory and some default low memory (say 256MB). The reservation >> appears as: >> >> cat /proc/iomem | grep -i crash >> af000000-beffffff : Crash kernel >> 433000000-43f7fffff : Crash kernel >> >> If crashkernel is then shrunk to 50MB (echo 52428800 > >> /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: >> af000000-beffffff : Crash kernel >> >> Instead, it should show 50MB: >> af000000-b21fffff : Crash kernel >> >> Further shrinking crashkernel to 40MB causes a kernel crash with the >> following trace (x86): >> >> BUG: kernel NULL pointer dereference, address: 0000000000000038 >> PGD 0 P4D 0 >> Oops: 0000 [#1] PREEMPT SMP NOPTI >> >> Call Trace: >> ? __die_body.cold+0x19/0x27 >> ? page_fault_oops+0x15a/0x2f0 >> ? search_module_extables+0x19/0x60 >> ? search_bpf_extables+0x5f/0x80 >> ? exc_page_fault+0x7e/0x180 >> ? asm_exc_page_fault+0x26/0x30 >> ? __release_resource+0xd/0xb0 >> release_resource+0x26/0x40 >> __crash_shrink_memory+0xe5/0x110 >> crash_shrink_memory+0x12a/0x190 >> kexec_crash_size_store+0x41/0x80 >> kernfs_fop_write_iter+0x141/0x1f0 >> vfs_write+0x294/0x460 >> ksys_write+0x6d/0xf0 >> >> >> This happens because __crash_shrink_memory()/kernel/crash_core.c >> incorrectly updates the crashk_res resource object even when >> crashk_low_res should be updated. >> >> Fix this by ensuring the correct crashkernel resource object is updated >> when shrinking crashkernel memory. >> >> Fixes: 16c6006af4d4 ("kexec: enable kexec_crash_size to support two crash kernel regions") >> Cc: Andrew Morton >> Cc: Baoquan He >> Cc: Zhen Lei >> Cc: kexec at lists.infradead.org >> Signed-off-by: Sourabh Jain >> --- >> kernel/crash_core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c >> index 3b1c43382eec..99dac1aa972a 100644 >> --- a/kernel/crash_core.c >> +++ b/kernel/crash_core.c >> @@ -373,7 +373,7 @@ static int __crash_shrink_memory(struct resource *old_res, >> old_res->start = 0; >> old_res->end = 0; >> } else { >> - crashk_res.end = ram_res->start - 1; >> + old_res->end = ram_res->start - 1; > It's a code bug, nice catch, thanks. > > Acked-by: Baoquan He Thanks for the ack, Baoquan. - Sourabh Jain From sourabhjain at linux.ibm.com Sun Nov 2 20:37:47 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Mon, 3 Nov 2025 10:07:47 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation Message-ID: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab475510e042 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Extend crashkernel CMA reservation support to powerpc. The following changes are made to enable CMA reservation on powerpc: - Parse and obtain the CMA reservation size along with other crashkernel parameters - Call reserve_crashkernel_cma() to allocate the CMA region for kdump - Include the CMA-reserved ranges in the usable memory ranges for the kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore. With the introduction of the CMA crashkernel regions, crash_exclude_mem_range() needs to be called multiple times to exclude both crashk_res and crashk_cma_ranges from the crash memory ranges. To avoid repetitive logic for validating mem_ranges size and handling reallocation when required, this functionality is moved to a new wrapper function crash_exclude_mem_range_guarded(). To ensure proper CMA reservation, reserve_crashkernel_cma() is called after pageblock_order is initialized. Update kernel-parameters.txt to document CMA support for crashkernel on powerpc architecture. Cc: Baoquan he Cc: Jiri Bohac Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- Changlog: v3 -> v4 - Removed repeated initialization to tmem in crash_exclude_mem_range_guarded() - Call crash_exclude_mem_range() with right crashk ranges v4 -> v5: - Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt --- .../admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/kexec.h | 2 + arch/powerpc/kernel/setup-common.c | 4 +- arch/powerpc/kexec/core.c | 10 ++++- arch/powerpc/kexec/ranges.c | 43 ++++++++++++++----- 5 files changed, 47 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6c42061ca20e..0f386b546cec 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1013,7 +1013,7 @@ It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. crashkernel=size[KMG],cma - [KNL, X86] Reserve additional crash kernel memory from + [KNL, X86, ppc64] Reserve additional crash kernel memory from CMA. This reservation is usable by the first system's userspace memory and kernel movable allocations (memory balloon, zswap). Pages allocated from this memory range diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4bbf9f699aaa..bd4a6c42a5f3 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); extern void arch_reserve_crashkernel(void); +extern void kdump_cma_reserve(void); #else static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } +static inline void kdump_cma_reserve(void) { } #endif #if defined(CONFIG_CRASH_DUMP) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 68d47c53876c..c8c42b419742 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) initmem_init(); /* - * Reserve large chunks of memory for use by CMA for fadump, KVM and + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and * hugetlb. These must be called after initmem_init(), so that * pageblock_order is initialised. */ fadump_cma_init(); + kdump_cma_reserve(); kvm_cma_reserve(); gigantic_hugetlb_cma_reserve(); diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index d1a2d755381c..25744737eff5 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) { } +unsigned long long cma_size; + /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) /* use common parsing */ ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, - &crash_base, NULL, NULL, NULL); + &crash_base, NULL, &cma_size, NULL); if (ret) return; @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) reserve_crashkernel_generic(crash_size, crash_base, 0, false); } +void __init kdump_cma_reserve(void) +{ + if (cma_size) + reserve_crashkernel_cma(cma_size); +} + int __init overlaps_crashkernel(unsigned long start, unsigned long size) { return (start + size) > crashk_res.start && start <= crashk_res.end; diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c index 3702b0bdab14..3bd27c38726b 100644 --- a/arch/powerpc/kexec/ranges.c +++ b/arch/powerpc/kexec/ranges.c @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges) */ int get_usable_memory_ranges(struct crash_mem **mem_ranges) { - int ret; + int ret, i; /* * Early boot failure observed on guests when low memory (first memory @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; i++) { + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1); + if (ret) + goto out; + } + ret = add_rtas_mem_range(mem_ranges); if (ret) goto out; @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) #endif /* CONFIG_KEXEC_FILE */ #ifdef CONFIG_CRASH_DUMP +static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges, + unsigned long long mstart, + unsigned long long mend) +{ + struct crash_mem *tmem = *mem_ranges; + + /* Reallocate memory ranges if there is no space to split ranges */ + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + return -ENOMEM; + } + + return crash_exclude_mem_range(tmem, mstart, mend); +} + /** * get_crash_memory_ranges - Get crash memory ranges. This list includes * first/crashing kernel's memory regions that @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) int get_crash_memory_ranges(struct crash_mem **mem_ranges) { phys_addr_t base, end; - struct crash_mem *tmem; u64 i; int ret; @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges) sort_memory_ranges(*mem_ranges, true); } - /* Reallocate memory ranges if there is no space to split ranges */ - tmem = *mem_ranges; - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { - tmem = realloc_mem_ranges(mem_ranges); - if (!tmem) - goto out; - } - /* Exclude crashkernel region */ - ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end); if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; ++i) { + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end); + if (ret) + goto out; + } + /* * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL * regions are exported to save their context at the time of -- 2.51.0 From maqianga at uniontech.com Sun Nov 2 22:34:36 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Mon, 3 Nov 2025 14:34:36 +0800 Subject: [PATCH v2 0/4] kexec: print out debugging message if required for kexec_load Message-ID: <20251103063440.1681657-1-maqianga@uniontech.com> Overview: ========= The commit a85ee18c7900 ("kexec_file: print out debugging message if required") has added general code printing in kexec_file_load(), but not in kexec_load(). Since kexec_load and kexec_file_load are not triggered simultaneously, we can unify the debug flag of kexec and kexec_file as kexec_core_dbg_print. Next, we need to do some things in this patchset: 1. rename kexec_file_dbg_print to kexec_core_dbg_print 2. Add KEXEC_DEBUG 3. Initialize kexec_core_dbg_print for kexec 4. Fix uninitialized struct kimage *image pointer 5. Set the reset of kexec_file_dbg_print to kimage_free Testing: ========= I did testing on x86_64, arm64 and loongarch. On x86_64, the printed messages look like below: unset CONFIG_KEXEC_FILE: [ 81.476959] kexec: nr_segments = 7 [ 81.477565] kexec: segment[0]: buf=0x00000000c22469d2 bufsz=0x70 mem=0x100000 memsz=0x1000 [ 81.478797] kexec: segment[1]: buf=0x00000000dedbb3b1 bufsz=0x140 mem=0x101000 memsz=0x1000 [ 81.480075] kexec: segment[2]: buf=0x00000000d7657a33 bufsz=0x30 mem=0x102000 memsz=0x1000 [ 81.481288] kexec: segment[3]: buf=0x00000000c7eb60a6 bufsz=0x16f40a8 mem=0x23bd0b000 memsz=0x16f5000 [ 81.489018] kexec: segment[4]: buf=0x00000000d1ca53c8 bufsz=0xd73400 mem=0x23d400000 memsz=0x2ab7000 [ 81.499697] kexec: segment[5]: buf=0x00000000697bac5a bufsz=0x50dc mem=0x23fff1000 memsz=0x6000 [ 81.501084] kexec: segment[6]: buf=0x000000001f743a68 bufsz=0x70e0 mem=0x23fff7000 memsz=0x9000 [ 81.502374] kexec: kexec_load: type:0, start:0x23fff7700 head:0x10a4b9002 flags:0x3e0010 set CONFIG_KEXEC_FILE [ 36.774228] kexec_file: kernel: 0000000066c386c8 kernel_size: 0xd78400 [ 36.821814] kexec-bzImage64: Loaded purgatory at 0x23fffb000 [ 36.821826] kexec-bzImage64: Loaded boot_param, command line and misc at 0x23fff9000 bufsz=0x12d0 memsz=0x2000 [ 36.821829] kexec-bzImage64: Loaded 64bit kernel at 0x23d400000 bufsz=0xd73400 memsz=0x2ab7000 [ 36.821918] kexec-bzImage64: Loaded initrd at 0x23bd0b000 bufsz=0x16f40a8 memsz=0x16f40a8 [ 36.821920] kexec-bzImage64: Final command line is: root=/dev/mapper/test-root crashkernel=auto rd.lvm.lv=test/root [ 36.821925] kexec-bzImage64: E820 memmap: [ 36.821926] kexec-bzImage64: 0000000000000000-000000000009ffff (1) [ 36.821928] kexec-bzImage64: 0000000000100000-0000000000811fff (1) [ 36.821930] kexec-bzImage64: 0000000000812000-0000000000812fff (2) [ 36.821931] kexec-bzImage64: 0000000000813000-00000000bee38fff (1) [ 36.821933] kexec-bzImage64: 00000000bee39000-00000000beec2fff (2) [ 36.821934] kexec-bzImage64: 00000000beec3000-00000000bf8ecfff (1) [ 36.821935] kexec-bzImage64: 00000000bf8ed000-00000000bfb6cfff (2) [ 36.821936] kexec-bzImage64: 00000000bfb6d000-00000000bfb7efff (3) [ 36.821937] kexec-bzImage64: 00000000bfb7f000-00000000bfbfefff (4) [ 36.821938] kexec-bzImage64: 00000000bfbff000-00000000bff7bfff (1) [ 36.821939] kexec-bzImage64: 00000000bff7c000-00000000bfffffff (2) [ 36.821940] kexec-bzImage64: 00000000feffc000-00000000feffffff (2) [ 36.821941] kexec-bzImage64: 00000000ffc00000-00000000ffffffff (2) [ 36.821942] kexec-bzImage64: 0000000100000000-000000023fffffff (1) [ 36.872348] kexec_file: nr_segments = 4 [ 36.872356] kexec_file: segment[0]: buf=0x000000005314ece7 bufsz=0x4000 mem=0x23fffb000 memsz=0x5000 [ 36.872370] kexec_file: segment[1]: buf=0x000000006e59b143 bufsz=0x12d0 mem=0x23fff9000 memsz=0x2000 [ 36.872374] kexec_file: segment[2]: buf=0x00000000eb7b1fc3 bufsz=0xd73400 mem=0x23d400000 memsz=0x2ab7000 [ 36.882172] kexec_file: segment[3]: buf=0x000000006af76441 bufsz=0x16f40a8 mem=0x23bd0b000 memsz=0x16f5000 [ 36.889113] kexec_file: kexec_file_load: type:0, start:0x23fffb150 head:0x101a2e002 flags:0x8 Changes in v2: ========== - Unify the debug flag of kexec and kexec_file - Fix uninitialized struct kimage *image pointer - Fix the issue of mismatch between loop variable types Qiang Ma (4): kexec: Fix uninitialized struct kimage *image pointer kexec: add kexec_core flag to control debug printing kexec: print out debugging message if required for kexec_load kexec_file: Fix the issue of mismatch between loop variable types include/linux/kexec.h | 9 +++++---- include/uapi/linux/kexec.h | 1 + kernel/kexec.c | 16 +++++++++++++++- kernel/kexec_core.c | 4 +++- kernel/kexec_file.c | 9 ++++----- 5 files changed, 28 insertions(+), 11 deletions(-) -- 2.20.1 From maqianga at uniontech.com Sun Nov 2 22:34:37 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Mon, 3 Nov 2025 14:34:37 +0800 Subject: [PATCH v2 1/4] kexec: Fix uninitialized struct kimage *image pointer In-Reply-To: <20251103063440.1681657-1-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> Message-ID: <20251103063440.1681657-2-maqianga@uniontech.com> The image is initialized to NULL. Then, after calling kimage_alloc_init, we can directly goto 'out' because at this time, the kimage_free will determine whether image is a NULL pointer. This can also prepare for the subsequent patch's kexec_core_dbg_print to be reset to zero in kimage_free. Signed-off-by: Qiang Ma --- kernel/kexec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 28008e3d462e..9bb1f2b6b268 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -95,6 +95,8 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, unsigned long i; int ret; + image = NULL; + /* * Because we write directly to the reserved memory region when loading * crash kernels we need a serialization here to prevent multiple crash @@ -129,7 +131,7 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, ret = kimage_alloc_init(&image, entry, nr_segments, segments, flags); if (ret) - goto out_unlock; + goto out; if (flags & KEXEC_PRESERVE_CONTEXT) image->preserve_context = 1; -- 2.20.1 From maqianga at uniontech.com Sun Nov 2 22:34:38 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Mon, 3 Nov 2025 14:34:38 +0800 Subject: [PATCH v2 2/4] kexec: add kexec_core flag to control debug printing In-Reply-To: <20251103063440.1681657-1-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> Message-ID: <20251103063440.1681657-3-maqianga@uniontech.com> The commit a85ee18c7900 ("kexec_file: print out debugging message if required") has added general code printing in kexec_file_load(), but not in kexec_load(). Since kexec_load and kexec_file_load are not triggered simultaneously, we can unify the debug flag of kexec and kexec_file as kexec_core_dbg_print. Next, we need to do four things: 1. rename kexec_file_dbg_print to kexec_core_dbg_print 2. Add KEXEC_DEBUG 3. Initialize kexec_core_dbg_print for kexec 4. Set the reset of kexec_file_dbg_print to kimage_free Signed-off-by: Qiang Ma --- include/linux/kexec.h | 9 +++++---- include/uapi/linux/kexec.h | 1 + kernel/kexec.c | 1 + kernel/kexec_core.c | 4 +++- kernel/kexec_file.c | 4 +--- 5 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ff7e231b0485..cad8b5c362af 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -455,10 +455,11 @@ bool kexec_load_permitted(int kexec_image_type); /* List of defined/legal kexec flags */ #ifndef CONFIG_KEXEC_JUMP -#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT) +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT | \ + KEXEC_DEBUG) #else #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR | \ - KEXEC_CRASH_HOTPLUG_SUPPORT) + KEXEC_CRASH_HOTPLUG_SUPPORT | KEXEC_DEBUG) #endif /* List of defined/legal kexec file flags */ @@ -525,10 +526,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { } #endif -extern bool kexec_file_dbg_print; +extern bool kexec_core_dbg_print; #define kexec_dprintk(fmt, arg...) \ - do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) + do { if (kexec_core_dbg_print) pr_info(fmt, ##arg); } while (0) extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size); extern void kimage_unmap_segment(void *buffer); diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 55749cb0b81d..819c600af125 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -14,6 +14,7 @@ #define KEXEC_PRESERVE_CONTEXT 0x00000002 #define KEXEC_UPDATE_ELFCOREHDR 0x00000004 #define KEXEC_CRASH_HOTPLUG_SUPPORT 0x00000008 +#define KEXEC_DEBUG 0x00000010 #define KEXEC_ARCH_MASK 0xffff0000 /* diff --git a/kernel/kexec.c b/kernel/kexec.c index 9bb1f2b6b268..c7a869d32f87 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -42,6 +42,7 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, if (!image) return -ENOMEM; + kexec_core_dbg_print = !!(flags & KEXEC_DEBUG); image->start = entry; image->nr_segments = nr_segments; memcpy(image->segment, segments, nr_segments * sizeof(*segments)); diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index fa00b239c5d9..865f2b14f23b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -53,7 +53,7 @@ atomic_t __kexec_lock = ATOMIC_INIT(0); /* Flag to indicate we are going to kexec a new kernel */ bool kexec_in_progress = false; -bool kexec_file_dbg_print; +bool kexec_core_dbg_print; /* * When kexec transitions to the new kernel there is a one-to-one @@ -576,6 +576,8 @@ void kimage_free(struct kimage *image) kimage_entry_t *ptr, entry; kimage_entry_t ind = 0; + kexec_core_dbg_print = false; + if (!image) return; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index eb62a9794242..4a24aadbad02 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -138,8 +138,6 @@ void kimage_file_post_load_cleanup(struct kimage *image) */ kfree(image->image_loader_data); image->image_loader_data = NULL; - - kexec_file_dbg_print = false; } #ifdef CONFIG_KEXEC_SIG @@ -314,7 +312,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, if (!image) return -ENOMEM; - kexec_file_dbg_print = !!(flags & KEXEC_FILE_DEBUG); + kexec_core_dbg_print = !!(flags & KEXEC_FILE_DEBUG); image->file_mode = 1; #ifdef CONFIG_CRASH_DUMP -- 2.20.1 From maqianga at uniontech.com Sun Nov 2 22:34:40 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Mon, 3 Nov 2025 14:34:40 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: <20251103063440.1681657-1-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> Message-ID: <20251103063440.1681657-5-maqianga@uniontech.com> The type of the struct kimage member variable nr_segments is unsigned long. Correct the loop variable i and the print format specifier type. Signed-off-by: Qiang Ma --- kernel/kexec_file.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 4a24aadbad02..7afdaa0efc50 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, int image_type = (flags & KEXEC_FILE_ON_CRASH) ? KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; struct kimage **dest_image, *image; - int ret = 0, i; + int ret = 0; + unsigned long i; /* We only trust the superuser with rebooting the system. */ if (!kexec_load_permitted(image_type)) @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, struct kexec_segment *ksegment; ksegment = &image->segment[i]; - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", i, ksegment->buf, ksegment->bufsz, ksegment->mem, ksegment->memsz); -- 2.20.1 From maqianga at uniontech.com Sun Nov 2 22:34:39 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Mon, 3 Nov 2025 14:34:39 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: <20251103063440.1681657-1-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> Message-ID: <20251103063440.1681657-4-maqianga@uniontech.com> The commit a85ee18c7900 ("kexec_file: print out debugging message if required") has added general code printing in kexec_file_load(), but not in kexec_load(). Especially in the RISC-V architecture, kexec_image_info() has been removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging message if required")). As a result, when using '-d' for the kexec_load interface, print nothing in the kernel space. This might be helpful for verifying the accuracy of the data passed to the kernel. Therefore, refer to this commit a85ee18c7900 ("kexec_file: print out debugging message if required"), debug print information has been added. Signed-off-by: Qiang Ma Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ --- kernel/kexec.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index c7a869d32f87..9b433b972cc1 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, if (ret) goto out; + kexec_dprintk("nr_segments = %lu\n", nr_segments); for (i = 0; i < nr_segments; i++) { + struct kexec_segment *ksegment; + + ksegment = &image->segment[i]; + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", + i, ksegment->buf, ksegment->bufsz, ksegment->mem, + ksegment->memsz); + ret = kimage_load_segment(image, i); if (ret) goto out; @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, if (ret) goto out; + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", + image->type, image->start, image->head, flags); + /* Install the new kernel and uninstall the old */ image = xchg(dest_image, image); -- 2.20.1 From pratyush at kernel.org Mon Nov 3 03:01:57 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Mon, 3 Nov 2025 12:01:57 +0100 Subject: [PATCH] kho: fix out-of-bounds access of vmalloc chunk Message-ID: <20251103110159.8399-1-pratyush@kernel.org> The list of pages in a vmalloc chunk is NULL-terminated. So when looping through the pages in a vmalloc chunk, both kho_restore_vmalloc() and kho_vmalloc_unpreserve_chunk() rightly make sure to stop when encountering a NULL page. But when the chunk is full, the loops do not stop and go past the bounds of chunk->phys, resulting in out-of-bounds memory access, and possibly the restoration or unpreservation of an invalid page. Fix this by making sure the processing of chunk stops at the end of the array. Fixes: a667300bd53f2 ("kho: add support for preserving vmalloc allocations") Signed-off-by: Pratyush Yadav --- Notes: Commit 89a3ecca49ee8 ("kho: make sure page being restored is actually from KHO") was quite helpful in catching this since kho_restore_page() errored out due to missing magic number, instead of "restoring" a random page and causing errors at other random places. kernel/kexec_handover.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index 76f0940fb4856..cc5aaa738bc50 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -869,7 +869,7 @@ static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) __kho_unpreserve(track, pfn, pfn + 1); - for (int i = 0; chunk->phys[i]; i++) { + for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { pfn = PHYS_PFN(chunk->phys[i]); __kho_unpreserve(track, pfn, pfn + 1); } @@ -992,7 +992,7 @@ void *kho_restore_vmalloc(const struct kho_vmalloc *preservation) while (chunk) { struct page *page; - for (int i = 0; chunk->phys[i]; i++) { + for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { phys_addr_t phys = chunk->phys[i]; if (idx + contig_pages > total_pages) base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa -- 2.47.3 From ritesh.list at gmail.com Mon Nov 3 02:10:10 2025 From: ritesh.list at gmail.com (Ritesh Harjani (IBM)) Date: Mon, 03 Nov 2025 15:40:10 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> Message-ID: <87y0on4ebh.ritesh.list@gmail.com> Sourabh Jain writes: > Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the > crashkernel= command line option") and commit ab475510e042 ("kdump: > implement reserve_crashkernel_cma") added CMA support for kdump > crashkernel reservation. > > Extend crashkernel CMA reservation support to powerpc. > > The following changes are made to enable CMA reservation on powerpc: > > - Parse and obtain the CMA reservation size along with other crashkernel > parameters > - Call reserve_crashkernel_cma() to allocate the CMA region for kdump > - Include the CMA-reserved ranges in the usable memory ranges for the > kdump kernel to use. > - Exclude the CMA-reserved ranges from the crash kernel memory to > prevent them from being exported through /proc/vmcore. > > With the introduction of the CMA crashkernel regions, > crash_exclude_mem_range() needs to be called multiple times to exclude > both crashk_res and crashk_cma_ranges from the crash memory ranges. To > avoid repetitive logic for validating mem_ranges size and handling > reallocation when required, this functionality is moved to a new wrapper > function crash_exclude_mem_range_guarded(). > > To ensure proper CMA reservation, reserve_crashkernel_cma() is called > after pageblock_order is initialized. > > Update kernel-parameters.txt to document CMA support for crashkernel on > powerpc architecture. > > Cc: Baoquan he > Cc: Jiri Bohac > Cc: Hari Bathini > Cc: Madhavan Srinivasan > Cc: Mahesh Salgaonkar > Cc: Michael Ellerman > Cc: Ritesh Harjani (IBM) > Cc: Shivang Upadhyay > Cc: kexec at lists.infradead.org > Signed-off-by: Sourabh Jain > --- > Changlog: > > v3 -> v4 > - Removed repeated initialization to tmem in > crash_exclude_mem_range_guarded() > - Call crash_exclude_mem_range() with right crashk ranges > > v4 -> v5: > - Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt > --- > .../admin-guide/kernel-parameters.txt | 2 +- > arch/powerpc/include/asm/kexec.h | 2 + > arch/powerpc/kernel/setup-common.c | 4 +- > arch/powerpc/kexec/core.c | 10 ++++- > arch/powerpc/kexec/ranges.c | 43 ++++++++++++++----- > 5 files changed, 47 insertions(+), 14 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 6c42061ca20e..0f386b546cec 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1013,7 +1013,7 @@ > It will be ignored when crashkernel=X,high is not used > or memory reserved is below 4G. > crashkernel=size[KMG],cma > - [KNL, X86] Reserve additional crash kernel memory from > + [KNL, X86, ppc64] Reserve additional crash kernel memory from Shouldn't this be PPC and not ppc64? If I see the crash_dump support... config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) The changes below aren't specific to ppc64 correct? > CMA. This reservation is usable by the first system's > userspace memory and kernel movable allocations (memory > balloon, zswap). Pages allocated from this memory range > diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h > index 4bbf9f699aaa..bd4a6c42a5f3 100644 > --- a/arch/powerpc/include/asm/kexec.h > +++ b/arch/powerpc/include/asm/kexec.h > @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem > #ifdef CONFIG_CRASH_RESERVE > int __init overlaps_crashkernel(unsigned long start, unsigned long size); > extern void arch_reserve_crashkernel(void); > +extern void kdump_cma_reserve(void); > #else > static inline void arch_reserve_crashkernel(void) {} > static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } > +static inline void kdump_cma_reserve(void) { } > #endif > > #if defined(CONFIG_CRASH_DUMP) > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c > index 68d47c53876c..c8c42b419742 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) > initmem_init(); > > /* > - * Reserve large chunks of memory for use by CMA for fadump, KVM and > + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and > * hugetlb. These must be called after initmem_init(), so that > * pageblock_order is initialised. > */ > fadump_cma_init(); > + kdump_cma_reserve(); > kvm_cma_reserve(); > gigantic_hugetlb_cma_reserve(); > > diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c > index d1a2d755381c..25744737eff5 100644 > --- a/arch/powerpc/kexec/core.c > +++ b/arch/powerpc/kexec/core.c > @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) > { > } > > +unsigned long long cma_size; > + nit: Since this is a gloabal powerpc variable you are defining, then can we keep it's name to crashk_cma_size? > /* > * Do not allocate memory (or fail in any way) in machine_kexec(). > * We are past the point of no return, committed to rebooting now. > @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) > > /* use common parsing */ > ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, > - &crash_base, NULL, NULL, NULL); > + &crash_base, NULL, &cma_size, NULL); > > if (ret) > return; > @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) > reserve_crashkernel_generic(crash_size, crash_base, 0, false); > } > > +void __init kdump_cma_reserve(void) > +{ > + if (cma_size) > + reserve_crashkernel_cma(cma_size); > +} > + nit: cma_size is already checked for null within reserve_crashkernel_cma(), so we don't really need kdump_cma_reserve() function call as such. Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA.. so instead do you think we can directly call reserve_crashkernel_cma(cma_size)? -ritesh > int __init overlaps_crashkernel(unsigned long start, unsigned long size) > { > return (start + size) > crashk_res.start && start <= crashk_res.end; > diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c > index 3702b0bdab14..3bd27c38726b 100644 > --- a/arch/powerpc/kexec/ranges.c > +++ b/arch/powerpc/kexec/ranges.c > @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges) > */ > int get_usable_memory_ranges(struct crash_mem **mem_ranges) > { > - int ret; > + int ret, i; > > /* > * Early boot failure observed on guests when low memory (first memory > @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) > if (ret) > goto out; > > + for (i = 0; i < crashk_cma_cnt; i++) { > + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, > + crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1); > + if (ret) > + goto out; > + } > + > ret = add_rtas_mem_range(mem_ranges); > if (ret) > goto out; > @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) > #endif /* CONFIG_KEXEC_FILE */ > > #ifdef CONFIG_CRASH_DUMP > +static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges, > + unsigned long long mstart, > + unsigned long long mend) > +{ > + struct crash_mem *tmem = *mem_ranges; > + > + /* Reallocate memory ranges if there is no space to split ranges */ > + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { > + tmem = realloc_mem_ranges(mem_ranges); > + if (!tmem) > + return -ENOMEM; > + } > + > + return crash_exclude_mem_range(tmem, mstart, mend); > +} > + > /** > * get_crash_memory_ranges - Get crash memory ranges. This list includes > * first/crashing kernel's memory regions that > @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) > int get_crash_memory_ranges(struct crash_mem **mem_ranges) > { > phys_addr_t base, end; > - struct crash_mem *tmem; > u64 i; > int ret; > > @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges) > sort_memory_ranges(*mem_ranges, true); > } > > - /* Reallocate memory ranges if there is no space to split ranges */ > - tmem = *mem_ranges; > - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { > - tmem = realloc_mem_ranges(mem_ranges); > - if (!tmem) > - goto out; > - } > - > /* Exclude crashkernel region */ > - ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); > + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end); > if (ret) > goto out; > > + for (i = 0; i < crashk_cma_cnt; ++i) { > + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start, > + crashk_cma_ranges[i].end); > + if (ret) > + goto out; > + } > + > /* > * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL > * regions are exported to save their context at the time of > -- > 2.51.0 From rppt at kernel.org Mon Nov 3 08:57:24 2025 From: rppt at kernel.org (Mike Rapoport) Date: Mon, 3 Nov 2025 18:57:24 +0200 Subject: [PATCH] kho: fix out-of-bounds access of vmalloc chunk In-Reply-To: <20251103110159.8399-1-pratyush@kernel.org> References: <20251103110159.8399-1-pratyush@kernel.org> Message-ID: On Mon, Nov 03, 2025 at 12:01:57PM +0100, Pratyush Yadav wrote: > The list of pages in a vmalloc chunk is NULL-terminated. So when looping > through the pages in a vmalloc chunk, both kho_restore_vmalloc() and > kho_vmalloc_unpreserve_chunk() rightly make sure to stop when > encountering a NULL page. But when the chunk is full, the loops do not > stop and go past the bounds of chunk->phys, resulting in out-of-bounds > memory access, and possibly the restoration or unpreservation of an > invalid page. > > Fix this by making sure the processing of chunk stops at the end of the > array. > > Fixes: a667300bd53f2 ("kho: add support for preserving vmalloc allocations") > Signed-off-by: Pratyush Yadav Reviewed-by: Mike Rapoport (Microsoft) > --- > > Notes: > Commit 89a3ecca49ee8 ("kho: make sure page being restored is actually > from KHO") was quite helpful in catching this since kho_restore_page() > errored out due to missing magic number, instead of "restoring" a random > page and causing errors at other random places. > > kernel/kexec_handover.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c > index 76f0940fb4856..cc5aaa738bc50 100644 > --- a/kernel/kexec_handover.c > +++ b/kernel/kexec_handover.c > @@ -869,7 +869,7 @@ static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) > > __kho_unpreserve(track, pfn, pfn + 1); > > - for (int i = 0; chunk->phys[i]; i++) { > + for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { > pfn = PHYS_PFN(chunk->phys[i]); > __kho_unpreserve(track, pfn, pfn + 1); > } > @@ -992,7 +992,7 @@ void *kho_restore_vmalloc(const struct kho_vmalloc *preservation) > while (chunk) { > struct page *page; > > - for (int i = 0; chunk->phys[i]; i++) { > + for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { > phys_addr_t phys = chunk->phys[i]; > > if (idx + contig_pages > total_pages) > > base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa > -- > 2.47.3 > -- Sincerely yours, Mike. From pratyush at kernel.org Mon Nov 3 10:02:30 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Mon, 3 Nov 2025 19:02:30 +0100 Subject: [PATCH 0/2] kho: misc fixes Message-ID: <20251103180235.71409-1-pratyush@kernel.org> This series has a couple of misc fixes for KHO I discovered during code review and testing. The series is based on top of [0] which has another fix for the function touched by patch 1. I spotted these two after sending the patch. If that one needs a reroll, I can combine the three into a series. [0] https://lore.kernel.org/linux-mm/20251103110159.8399-1-pratyush at kernel.org/ Pratyush Yadav (2): kho: fix unpreservation of higher-order vmalloc preservations kho: warn and exit when unpreserved page wasn't preserved kernel/kexec_handover.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa prerequisite-patch-id: fce7dcea45c85bac06a559d06f038e9c0cb38b17 -- 2.47.3 From pratyush at kernel.org Mon Nov 3 10:02:31 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Mon, 3 Nov 2025 19:02:31 +0100 Subject: [PATCH 1/2] kho: fix unpreservation of higher-order vmalloc preservations In-Reply-To: <20251103180235.71409-1-pratyush@kernel.org> References: <20251103180235.71409-1-pratyush@kernel.org> Message-ID: <20251103180235.71409-2-pratyush@kernel.org> kho_vmalloc_unpreserve_chunk() calls __kho_unpreserve() with end_pfn as pfn + 1. This happens to work for 0-order pages, but leaks higher order pages. For example, say order 2 pages back the allocation. During preservation, they get preserved in the order 2 bitmaps, but kho_vmalloc_unpreserve_chunk() would try to unpreserve them from the order 0 bitmaps, which should not have these bits set anyway, leaving the order 2 bitmaps untouched. This results in the pages being carried over to the next kernel. Nothing will free those pages in the next boot, leaking them. Fix this by taking the order into account when calculating the end PFN for __kho_unpreserve(). Fixes: a667300bd53f2 ("kho: add support for preserving vmalloc allocations") Signed-off-by: Pratyush Yadav --- Notes: When Pasha's patch [0] to add kho_unpreserve_pages() is merged, maybe it would be a better idea to use kho_unpreserve_pages() here? But that is something for later I suppose. [0] https://lore.kernel.org/linux-mm/20251101142325.1326536-4-pasha.tatashin at soleen.com/ kernel/kexec_handover.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index cc5aaa738bc50..c2bcbb10918ce 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -862,7 +862,8 @@ static struct kho_vmalloc_chunk *new_vmalloc_chunk(struct kho_vmalloc_chunk *cur return NULL; } -static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) +static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk, + unsigned short order) { struct kho_mem_track *track = &kho_out.ser.track; unsigned long pfn = PHYS_PFN(virt_to_phys(chunk)); @@ -871,7 +872,7 @@ static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { pfn = PHYS_PFN(chunk->phys[i]); - __kho_unpreserve(track, pfn, pfn + 1); + __kho_unpreserve(track, pfn, pfn + (1 << order)); } } @@ -882,7 +883,7 @@ static void kho_vmalloc_free_chunks(struct kho_vmalloc *kho_vmalloc) while (chunk) { struct kho_vmalloc_chunk *tmp = chunk; - kho_vmalloc_unpreserve_chunk(chunk); + kho_vmalloc_unpreserve_chunk(chunk, kho_vmalloc->order); chunk = KHOSER_LOAD_PTR(chunk->hdr.next); free_page((unsigned long)tmp); -- 2.47.3 From pratyush at kernel.org Mon Nov 3 10:02:32 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Mon, 3 Nov 2025 19:02:32 +0100 Subject: [PATCH 2/2] kho: warn and exit when unpreserved page wasn't preserved In-Reply-To: <20251103180235.71409-1-pratyush@kernel.org> References: <20251103180235.71409-1-pratyush@kernel.org> Message-ID: <20251103180235.71409-3-pratyush@kernel.org> Calling __kho_unpreserve() on a pair of (pfn, end_pfn) that wasn't preserved is a bug. Currently, if that is done, the physxa or bits can be NULL. This results in a soft lockup since a NULL physxa or bits results in redoing the loop without ever making any progress. Return when physxa or bits are not found, but WARN first to loudly indicate invalid behaviour. Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pratyush Yadav --- kernel/kexec_handover.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c index c2bcbb10918ce..e5fd833726226 100644 --- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -167,12 +167,12 @@ static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pfn, const unsigned long pfn_high = pfn >> order; physxa = xa_load(&track->orders, order); - if (!physxa) - continue; + if (WARN_ON_ONCE(!physxa)) + return; bits = xa_load(&physxa->phys_bits, pfn_high / PRESERVE_BITS); - if (!bits) - continue; + if (WARN_ON_ONCE(!bits)) + return; clear_bit(pfn_high % PRESERVE_BITS, bits->preserve); -- 2.47.3 From akpm at linux-foundation.org Mon Nov 3 16:20:20 2025 From: akpm at linux-foundation.org (Andrew Morton) Date: Mon, 3 Nov 2025 16:20:20 -0800 Subject: [PATCH 0/2] kho: misc fixes In-Reply-To: <20251103180235.71409-1-pratyush@kernel.org> References: <20251103180235.71409-1-pratyush@kernel.org> Message-ID: <20251103162020.ac696dbc695f9341e7a267f7@linux-foundation.org> On Mon, 3 Nov 2025 19:02:30 +0100 Pratyush Yadav wrote: > This series has a couple of misc fixes for KHO I discovered during code > review and testing. > > The series is based on top of [0] which has another fix for the function > touched by patch 1. I spotted these two after sending the patch. If that > one needs a reroll, I can combine the three into a series. > Things appear to be misordered here. [1/2] "kho: fix unpreservation of higher-order vmalloc preservations" fixes a667300bd53f2, so it's wanted in 6.18-rcX [2/2] "kho: warn and exit when unpreserved page wasn't preserved" fixes fc33e4b44b271, so it's wanted in 6.16+ So can we please have [2/2] as a standalone fix against latest -linus, with a cc:stable? And then [1/2] as a standalone fix against latest -linus without a cc:stable. Once I have those merged up we can then take a look at what to do about the 6.19 material which is presently queued in mm-unstable. Thanks. From akpm at linux-foundation.org Mon Nov 3 17:23:21 2025 From: akpm at linux-foundation.org (Andrew Morton) Date: Mon, 3 Nov 2025 17:23:21 -0800 Subject: [PATCH 0/2] kho: misc fixes In-Reply-To: <20251103162020.ac696dbc695f9341e7a267f7@linux-foundation.org> References: <20251103180235.71409-1-pratyush@kernel.org> <20251103162020.ac696dbc695f9341e7a267f7@linux-foundation.org> Message-ID: <20251103172321.689294e48c2fae795e114ce6@linux-foundation.org> On Mon, 3 Nov 2025 16:20:20 -0800 Andrew Morton wrote: > On Mon, 3 Nov 2025 19:02:30 +0100 Pratyush Yadav wrote: > > > This series has a couple of misc fixes for KHO I discovered during code > > review and testing. > > > > The series is based on top of [0] which has another fix for the function > > touched by patch 1. I spotted these two after sending the patch. If that > > one needs a reroll, I can combine the three into a series. > > > > Things appear to be misordered here. > > [1/2] "kho: fix unpreservation of higher-order vmalloc preservations" > fixes a667300bd53f2, so it's wanted in 6.18-rcX > > [2/2] "kho: warn and exit when unpreserved page wasn't preserved" > fixes fc33e4b44b271, so it's wanted in 6.16+ > > So can we please have [2/2] as a standalone fix against latest -linus, > with a cc:stable? > > And then [1/2] as a standalone fix against latest -linus without a > cc:stable. > OK, I think I figured it out. In mm-hotfixes-unstable I have kho-fix-out-of-bounds-access-of-vmalloc-chunk.patch kho-fix-unpreservation-of-higher-order-vmalloc-preservations.patch kho-warn-and-exit-when-unpreserved-page-wasnt-preserved.patch The first two are applicable to 6.18-rcX and the third is applicable to 6.18-rcX, with a cc:stable for backporting. From maqianga at uniontech.com Mon Nov 3 18:59:59 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Tue, 4 Nov 2025 10:59:59 +0800 Subject: [PATCH] kexec: add kexec flag to support debug printing Message-ID: <20251104025959.1948450-1-maqianga@uniontech.com> This add KEXEC_DEBUG to kexec_flags so that it can be passed to kernel when '-d' is added with kexec_load interface. With that flag enabled, kernel can enable the debugging message printing. This patch requires support from the kexec_load debugging message of the Linux kernel[1]. [1]: https://lore.kernel.org/kexec/20251103063440.1681657-1-maqianga at uniontech.com/ Signed-off-by: Qiang Ma --- kexec/kexec-syscall.h | 1 + kexec/kexec.c | 1 + 2 files changed, 2 insertions(+) diff --git a/kexec/kexec-syscall.h b/kexec/kexec-syscall.h index e9bb7de..b60804f 100644 --- a/kexec/kexec-syscall.h +++ b/kexec/kexec-syscall.h @@ -120,6 +120,7 @@ static inline long kexec_file_load(int kernel_fd, int initrd_fd, #define KEXEC_PRESERVE_CONTEXT 0x00000002 #define KEXEC_UPDATE_ELFCOREHDR 0x00000004 #define KEXEC_CRASH_HOTPLUG_SUPPORT 0x00000008 +#define KEXEC_DEBUG 0x00000010 #define KEXEC_ARCH_MASK 0xffff0000 /* Flags for kexec file based system call */ diff --git a/kexec/kexec.c b/kexec/kexec.c index c9e4bcb..f425422 100644 --- a/kexec/kexec.c +++ b/kexec/kexec.c @@ -1518,6 +1518,7 @@ int main(int argc, char *argv[]) return 0; case OPT_DEBUG: kexec_debug = 1; + kexec_flags |= KEXEC_DEBUG; kexec_file_flags |= KEXEC_FILE_DEBUG; break; case OPT_NOIFDOWN: -- 2.20.1 From sourabhjain at linux.ibm.com Mon Nov 3 21:18:51 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 10:48:51 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <87y0on4ebh.ritesh.list@gmail.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> Message-ID: <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote: > Sourabh Jain writes: > >> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the >> crashkernel= command line option") and commit ab475510e042 ("kdump: >> implement reserve_crashkernel_cma") added CMA support for kdump >> crashkernel reservation. >> >> Extend crashkernel CMA reservation support to powerpc. >> >> The following changes are made to enable CMA reservation on powerpc: >> >> - Parse and obtain the CMA reservation size along with other crashkernel >> parameters >> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump >> - Include the CMA-reserved ranges in the usable memory ranges for the >> kdump kernel to use. >> - Exclude the CMA-reserved ranges from the crash kernel memory to >> prevent them from being exported through /proc/vmcore. >> >> With the introduction of the CMA crashkernel regions, >> crash_exclude_mem_range() needs to be called multiple times to exclude >> both crashk_res and crashk_cma_ranges from the crash memory ranges. To >> avoid repetitive logic for validating mem_ranges size and handling >> reallocation when required, this functionality is moved to a new wrapper >> function crash_exclude_mem_range_guarded(). >> >> To ensure proper CMA reservation, reserve_crashkernel_cma() is called >> after pageblock_order is initialized. >> >> Update kernel-parameters.txt to document CMA support for crashkernel on >> powerpc architecture. >> >> Cc: Baoquan he >> Cc: Jiri Bohac >> Cc: Hari Bathini >> Cc: Madhavan Srinivasan >> Cc: Mahesh Salgaonkar >> Cc: Michael Ellerman >> Cc: Ritesh Harjani (IBM) >> Cc: Shivang Upadhyay >> Cc: kexec at lists.infradead.org >> Signed-off-by: Sourabh Jain >> --- >> Changlog: >> >> v3 -> v4 >> - Removed repeated initialization to tmem in >> crash_exclude_mem_range_guarded() >> - Call crash_exclude_mem_range() with right crashk ranges >> >> v4 -> v5: >> - Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt >> --- >> .../admin-guide/kernel-parameters.txt | 2 +- >> arch/powerpc/include/asm/kexec.h | 2 + >> arch/powerpc/kernel/setup-common.c | 4 +- >> arch/powerpc/kexec/core.c | 10 ++++- >> arch/powerpc/kexec/ranges.c | 43 ++++++++++++++----- >> 5 files changed, 47 insertions(+), 14 deletions(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 6c42061ca20e..0f386b546cec 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -1013,7 +1013,7 @@ >> It will be ignored when crashkernel=X,high is not used >> or memory reserved is below 4G. >> crashkernel=size[KMG],cma >> - [KNL, X86] Reserve additional crash kernel memory from >> + [KNL, X86, ppc64] Reserve additional crash kernel memory from > Shouldn't this be PPC and not ppc64? > > If I see the crash_dump support... > > config ARCH_SUPPORTS_CRASH_DUMP > def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) > > The changes below aren't specific to ppc64 correct? The thing is this feature is only supported with KEXEC_FILE and which only supported on PPC64: config ARCH_SUPPORTS_KEXEC_FILE ??? def_bool PPC64 Hence I kept it as ppc64. I think I should update that in the commit message. Also do you think is it good to restrict this feature to KEXEC_FILE? > >> CMA. This reservation is usable by the first system's >> userspace memory and kernel movable allocations (memory >> balloon, zswap). Pages allocated from this memory range >> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h >> index 4bbf9f699aaa..bd4a6c42a5f3 100644 >> --- a/arch/powerpc/include/asm/kexec.h >> +++ b/arch/powerpc/include/asm/kexec.h >> @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem >> #ifdef CONFIG_CRASH_RESERVE >> int __init overlaps_crashkernel(unsigned long start, unsigned long size); >> extern void arch_reserve_crashkernel(void); >> +extern void kdump_cma_reserve(void); >> #else >> static inline void arch_reserve_crashkernel(void) {} >> static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } >> +static inline void kdump_cma_reserve(void) { } >> #endif >> >> #if defined(CONFIG_CRASH_DUMP) >> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c >> index 68d47c53876c..c8c42b419742 100644 >> --- a/arch/powerpc/kernel/setup-common.c >> +++ b/arch/powerpc/kernel/setup-common.c >> @@ -35,6 +35,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) >> initmem_init(); >> >> /* >> - * Reserve large chunks of memory for use by CMA for fadump, KVM and >> + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and >> * hugetlb. These must be called after initmem_init(), so that >> * pageblock_order is initialised. >> */ >> fadump_cma_init(); >> + kdump_cma_reserve(); >> kvm_cma_reserve(); >> gigantic_hugetlb_cma_reserve(); >> >> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c >> index d1a2d755381c..25744737eff5 100644 >> --- a/arch/powerpc/kexec/core.c >> +++ b/arch/powerpc/kexec/core.c >> @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) >> { >> } >> >> +unsigned long long cma_size; >> + > nit: > Since this is a gloabal powerpc variable you are defining, then can we > keep it's name to crashk_cma_size? Yeah make sense. I will update the variable name. > >> /* >> * Do not allocate memory (or fail in any way) in machine_kexec(). >> * We are past the point of no return, committed to rebooting now. >> @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) >> >> /* use common parsing */ >> ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, >> - &crash_base, NULL, NULL, NULL); >> + &crash_base, NULL, &cma_size, NULL); >> >> if (ret) >> return; >> @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) >> reserve_crashkernel_generic(crash_size, crash_base, 0, false); >> } >> >> +void __init kdump_cma_reserve(void) >> +{ >> + if (cma_size) >> + reserve_crashkernel_cma(cma_size); >> +} >> + > nit: > cma_size is already checked for null within reserve_crashkernel_cma(), > so we don't really need kdump_cma_reserve() function call as such. > > Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA.. > so instead do you think we can directly call reserve_crashkernel_cma(cma_size)? I think the above kdump_cma_reserve() definition should come under CONFIG_CRASH_RESERVE because the way it is declared in arch/powerpc/include/asm/kexec.h. I would like to keep kdump_cma_reserve() as is it because of two reasons: - It keeps setup_arch() free from kdump #ifdefs - In case if we want to add some condition on this reservation it would straight forward. So lets keep kdump_cma_reserve as is, unless you have strong opinion on not to. >> int __init overlaps_crashkernel(unsigned long start, unsigned long size) >> { >> return (start + size) > crashk_res.start && start <= crashk_res.end; >> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c >> index 3702b0bdab14..3bd27c38726b 100644 >> --- a/arch/powerpc/kexec/ranges.c >> +++ b/arch/powerpc/kexec/ranges.c >> @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges) >> */ >> int get_usable_memory_ranges(struct crash_mem **mem_ranges) >> { >> - int ret; >> + int ret, i; >> >> /* >> * Early boot failure observed on guests when low memory (first memory >> @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) >> if (ret) >> goto out; >> >> + for (i = 0; i < crashk_cma_cnt; i++) { >> + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, >> + crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1); >> + if (ret) >> + goto out; >> + } >> + >> ret = add_rtas_mem_range(mem_ranges); >> if (ret) >> goto out; >> @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) >> #endif /* CONFIG_KEXEC_FILE */ >> >> #ifdef CONFIG_CRASH_DUMP >> +static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges, >> + unsigned long long mstart, >> + unsigned long long mend) >> +{ >> + struct crash_mem *tmem = *mem_ranges; >> + >> + /* Reallocate memory ranges if there is no space to split ranges */ >> + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { >> + tmem = realloc_mem_ranges(mem_ranges); >> + if (!tmem) >> + return -ENOMEM; >> + } >> + >> + return crash_exclude_mem_range(tmem, mstart, mend); >> +} >> + >> /** >> * get_crash_memory_ranges - Get crash memory ranges. This list includes >> * first/crashing kernel's memory regions that >> @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) >> int get_crash_memory_ranges(struct crash_mem **mem_ranges) >> { >> phys_addr_t base, end; >> - struct crash_mem *tmem; >> u64 i; >> int ret; >> >> @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges) >> sort_memory_ranges(*mem_ranges, true); >> } >> >> - /* Reallocate memory ranges if there is no space to split ranges */ >> - tmem = *mem_ranges; >> - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { >> - tmem = realloc_mem_ranges(mem_ranges); >> - if (!tmem) >> - goto out; >> - } >> - >> /* Exclude crashkernel region */ >> - ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); >> + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end); >> if (ret) >> goto out; >> >> + for (i = 0; i < crashk_cma_cnt; ++i) { >> + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start, >> + crashk_cma_ranges[i].end); >> + if (ret) >> + goto out; >> + } >> + >> /* >> * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL >> * regions are exported to save their context at the time of >> -- >> 2.51.0 From sourabhjain at linux.ibm.com Mon Nov 3 22:26:42 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 11:56:42 +0530 Subject: [PATCH 0/2] Export kdump crashkernel CMA ranges In-Reply-To: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> References: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> Message-ID: Cc others who can provide input. On 03/11/25 09:28, Sourabh Jain wrote: > /sys/kernel/kexec_crash_cma_ranges to export all CMA regions reserved > for the crashkernel to user-space. This enables user-space tools > configuring kdump to determine the amount of memory reserved for the > crashkernel. When CMA is used for crashkernel allocation, tools can use > this information to warn users that attempting to capture user pages > while CMA reservation is active may lead to unreliable or incomplete > dump capture. > > While adding documentation for the new sysfs interface, I realized that > there was no ABI document for the existing kexec and kdump sysfs > interfaces, so I added one. > > The first patch adds the ABI documentation for the existing kexec and > kdump sysfs interfaces, and the second patch adds the > /sys/kernel/kexec_crash_cma_ranges sysfs interface along with its > corresponding ABI documentation. > > *Seeking opinions* > There are already four kexec/kdump sysfs entries under /sys/kernel/, > and this patch series adds one more. Should we consider moving them to > a separate directory, such as /sys/kernel/kexec, to avoid polluting > /sys/kernel/? For backward compatibility, we can create symlinks at > the old locations for sometime and remove them in the future. > > Cc: Andrew Morton > Cc: Baoquan he > Cc: Jiri Bohac > Cc: Shivang Upadhyay > Cc: linuxppc-dev at lists.ozlabs.org > Cc: kexec at lists.infradead.org > > Sourabh Jain (2): > Documentation/ABI: add kexec and kdump sysfs interface > crash: export crashkernel CMA reservation to userspace > > .../ABI/testing/sysfs-kernel-kexec-kdump | 53 +++++++++++++++++++ > kernel/ksysfs.c | 17 ++++++ > 2 files changed, 70 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump > From sourabhjain at linux.ibm.com Tue Nov 4 01:34:37 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 15:04:37 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> Message-ID: On 04/11/25 10:48, Sourabh Jain wrote: > > > On 03/11/25 15:40, Ritesh Harjani (IBM) wrote: >> Sourabh Jain writes: >> >>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the >>> crashkernel= command line option") and commit ab475510e042 ("kdump: >>> implement reserve_crashkernel_cma") added CMA support for kdump >>> crashkernel reservation. >>> >>> Extend crashkernel CMA reservation support to powerpc. >>> >>> The following changes are made to enable CMA reservation on powerpc: >>> >>> - Parse and obtain the CMA reservation size along with other >>> crashkernel >>> ?? parameters >>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump >>> - Include the CMA-reserved ranges in the usable memory ranges for the >>> ?? kdump kernel to use. >>> - Exclude the CMA-reserved ranges from the crash kernel memory to >>> ?? prevent them from being exported through /proc/vmcore. >>> >>> With the introduction of the CMA crashkernel regions, >>> crash_exclude_mem_range() needs to be called multiple times to exclude >>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To >>> avoid repetitive logic for validating mem_ranges size and handling >>> reallocation when required, this functionality is moved to a new >>> wrapper >>> function crash_exclude_mem_range_guarded(). >>> >>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called >>> after pageblock_order is initialized. >>> >>> Update kernel-parameters.txt to document CMA support for crashkernel on >>> powerpc architecture. >>> >>> Cc: Baoquan he >>> Cc: Jiri Bohac >>> Cc: Hari Bathini >>> Cc: Madhavan Srinivasan >>> Cc: Mahesh Salgaonkar >>> Cc: Michael Ellerman >>> Cc: Ritesh Harjani (IBM) >>> Cc: Shivang Upadhyay >>> Cc: kexec at lists.infradead.org >>> Signed-off-by: Sourabh Jain >>> --- >>> Changlog: >>> >>> v3 -> v4 >>> ? - Removed repeated initialization to tmem in >>> ??? crash_exclude_mem_range_guarded() >>> ? - Call crash_exclude_mem_range() with right crashk ranges >>> >>> v4 -> v5: >>> ? - Document CMA-based crashkernel support for ppc64 in >>> kernel-parameters.txt >>> --- >>> ? .../admin-guide/kernel-parameters.txt???????? |? 2 +- >>> ? arch/powerpc/include/asm/kexec.h????????????? |? 2 + >>> ? arch/powerpc/kernel/setup-common.c??????????? |? 4 +- >>> ? arch/powerpc/kexec/core.c???????????????????? | 10 ++++- >>> ? arch/powerpc/kexec/ranges.c?????????????????? | 43 >>> ++++++++++++++----- >>> ? 5 files changed, 47 insertions(+), 14 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/kernel-parameters.txt >>> b/Documentation/admin-guide/kernel-parameters.txt >>> index 6c42061ca20e..0f386b546cec 100644 >>> --- a/Documentation/admin-guide/kernel-parameters.txt >>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>> @@ -1013,7 +1013,7 @@ >>> ????????????? It will be ignored when crashkernel=X,high is not used >>> ????????????? or memory reserved is below 4G. >>> ????? crashkernel=size[KMG],cma >>> -??????????? [KNL, X86] Reserve additional crash kernel memory from >>> +??????????? [KNL, X86, ppc64] Reserve additional crash kernel >>> memory from >> Shouldn't this be PPC and not ppc64? >> >> If I see the crash_dump support... >> >> config ARCH_SUPPORTS_CRASH_DUMP >> ????def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) >> >> The changes below aren't specific to ppc64 correct? > > The thing is this feature is only supported with KEXEC_FILE and which > only supported on PPC64: > > config ARCH_SUPPORTS_KEXEC_FILE > ??? def_bool PPC64 > > Hence I kept it as ppc64. > > I think I should update that in the commit message. > > Also do you think is it good to restrict this feature to KEXEC_FILE? Putting this under KEXEC_FILE may not help much because KEXEC_FILE is enabled by default in most configurations. Once it is enabled, the CMA reservation will happen regardless of which system call is used to load the kdump kernel (kexec_load or kexec_file_load). However, not restricting this feature to KEXEC_FILE will allow the kexec tool to independently add support for this feature in the future for the kexec_load system call. With that logic, I think if we do not restrict this feature to KEXEC_FILE, the support will be available for ppc and not limited to ppc64. > >> >>> ????????????? CMA. This reservation is usable by the first system's >>> ????????????? userspace memory and kernel movable allocations (memory >>> ????????????? balloon, zswap). Pages allocated from this memory range >>> diff --git a/arch/powerpc/include/asm/kexec.h >>> b/arch/powerpc/include/asm/kexec.h >>> index 4bbf9f699aaa..bd4a6c42a5f3 100644 >>> --- a/arch/powerpc/include/asm/kexec.h >>> +++ b/arch/powerpc/include/asm/kexec.h >>> @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage >>> *image, void *fdt, struct crash_mem >>> ? #ifdef CONFIG_CRASH_RESERVE >>> ? int __init overlaps_crashkernel(unsigned long start, unsigned long >>> size); >>> ? extern void arch_reserve_crashkernel(void); >>> +extern void kdump_cma_reserve(void); >>> ? #else >>> ? static inline void arch_reserve_crashkernel(void) {} >>> ? static inline int overlaps_crashkernel(unsigned long start, >>> unsigned long size) { return 0; } >>> +static inline void kdump_cma_reserve(void) { } >>> ? #endif >>> ? ? #if defined(CONFIG_CRASH_DUMP) >>> diff --git a/arch/powerpc/kernel/setup-common.c >>> b/arch/powerpc/kernel/setup-common.c >>> index 68d47c53876c..c8c42b419742 100644 >>> --- a/arch/powerpc/kernel/setup-common.c >>> +++ b/arch/powerpc/kernel/setup-common.c >>> @@ -35,6 +35,7 @@ >>> ? #include >>> ? #include >>> ? #include >>> +#include >>> ? #include >>> ? #include >>> ? #include >>> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) >>> ????? initmem_init(); >>> ? ????? /* >>> -???? * Reserve large chunks of memory for use by CMA for fadump, >>> KVM and >>> +???? * Reserve large chunks of memory for use by CMA for kdump, >>> fadump, KVM and >>> ?????? * hugetlb. These must be called after initmem_init(), so that >>> ?????? * pageblock_order is initialised. >>> ?????? */ >>> ????? fadump_cma_init(); >>> +??? kdump_cma_reserve(); >>> ????? kvm_cma_reserve(); >>> ????? gigantic_hugetlb_cma_reserve(); >>> ? diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c >>> index d1a2d755381c..25744737eff5 100644 >>> --- a/arch/powerpc/kexec/core.c >>> +++ b/arch/powerpc/kexec/core.c >>> @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) >>> ? { >>> ? } >>> ? +unsigned long long cma_size; >>> + >> nit: >> Since this is a gloabal powerpc variable you are defining, then can we >> keep it's name to crashk_cma_size? > > Yeah make sense. I will update the variable name. > > >> >>> ? /* >>> ?? * Do not allocate memory (or fail in any way) in machine_kexec(). >>> ?? * We are past the point of no return, committed to rebooting now. >>> @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) >>> ? ????? /* use common parsing */ >>> ????? ret = parse_crashkernel(boot_command_line, total_mem_sz, >>> &crash_size, >>> -??????????????? &crash_base, NULL, NULL, NULL); >>> +??????????????? &crash_base, NULL, &cma_size, NULL); >>> ? ????? if (ret) >>> ????????? return; >>> @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) >>> ????? reserve_crashkernel_generic(crash_size, crash_base, 0, false); >>> ? } >>> ? +void __init kdump_cma_reserve(void) >>> +{ >>> +??? if (cma_size) >>> +??????? reserve_crashkernel_cma(cma_size); >>> +} >>> + >> nit: >> cma_size is already checked for null within reserve_crashkernel_cma(), >> so we don't really need kdump_cma_reserve() function call as such. >> >> Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA.. >> so instead do you think we can directly call >> reserve_crashkernel_cma(cma_size)? > > I think the above kdump_cma_reserve() definition should come under > CONFIG_CRASH_RESERVE > because the way it is declared in arch/powerpc/include/asm/kexec.h. > > I would like to keep kdump_cma_reserve() as is it because of two reasons: > > - It keeps setup_arch() free from kdump #ifdefs > - In case if we want to add some condition on this reservation it > would straight forward. > > So lets keep kdump_cma_reserve as is, unless you have strong opinion > on not to. > >>> ? int __init overlaps_crashkernel(unsigned long start, unsigned long >>> size) >>> ? { >>> ????? return (start + size) > crashk_res.start && start <= >>> crashk_res.end; >>> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c >>> index 3702b0bdab14..3bd27c38726b 100644 >>> --- a/arch/powerpc/kexec/ranges.c >>> +++ b/arch/powerpc/kexec/ranges.c >>> @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem >>> **mem_ranges) >>> ?? */ >>> ? int get_usable_memory_ranges(struct crash_mem **mem_ranges) >>> ? { >>> -??? int ret; >>> +??? int ret, i; >>> ? ????? /* >>> ?????? * Early boot failure observed on guests when low memory >>> (first memory >>> @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem >>> **mem_ranges) >>> ????? if (ret) >>> ????????? goto out; >>> ? +??? for (i = 0; i < crashk_cma_cnt; i++) { >>> +??????? ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, >>> +??????????????????? crashk_cma_ranges[i].end - >>> crashk_cma_ranges[i].start + 1); >>> +??????? if (ret) >>> +??????????? goto out; >>> +??? } >>> + >>> ????? ret = add_rtas_mem_range(mem_ranges); >>> ????? if (ret) >>> ????????? goto out; >>> @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem >>> **mem_ranges) >>> ? #endif /* CONFIG_KEXEC_FILE */ >>> ? ? #ifdef CONFIG_CRASH_DUMP >>> +static int crash_exclude_mem_range_guarded(struct crash_mem >>> **mem_ranges, >>> +?????????????????????? unsigned long long mstart, >>> +?????????????????????? unsigned long long mend) >>> +{ >>> +??? struct crash_mem *tmem = *mem_ranges; >>> + >>> +??? /* Reallocate memory ranges if there is no space to split >>> ranges */ >>> +??? if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { >>> +??????? tmem = realloc_mem_ranges(mem_ranges); >>> +??????? if (!tmem) >>> +??????????? return -ENOMEM; >>> +??? } >>> + >>> +??? return crash_exclude_mem_range(tmem, mstart, mend); >>> +} >>> + >>> ? /** >>> ?? * get_crash_memory_ranges - Get crash memory ranges. This list >>> includes >>> ?? *?????????????????????????? first/crashing kernel's memory >>> regions that >>> @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem >>> **mem_ranges) >>> ? int get_crash_memory_ranges(struct crash_mem **mem_ranges) >>> ? { >>> ????? phys_addr_t base, end; >>> -??? struct crash_mem *tmem; >>> ????? u64 i; >>> ????? int ret; >>> ? @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem >>> **mem_ranges) >>> ????????????? sort_memory_ranges(*mem_ranges, true); >>> ????? } >>> ? -??? /* Reallocate memory ranges if there is no space to split >>> ranges */ >>> -??? tmem = *mem_ranges; >>> -??? if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { >>> -??????? tmem = realloc_mem_ranges(mem_ranges); >>> -??????? if (!tmem) >>> -??????????? goto out; >>> -??? } >>> - >>> ????? /* Exclude crashkernel region */ >>> -??? ret = crash_exclude_mem_range(tmem, crashk_res.start, >>> crashk_res.end); >>> +??? ret = crash_exclude_mem_range_guarded(mem_ranges, >>> crashk_res.start, crashk_res.end); >>> ????? if (ret) >>> ????????? goto out; >>> ? +??? for (i = 0; i < crashk_cma_cnt; ++i) { >>> +??????? ret = crash_exclude_mem_range_guarded(mem_ranges, >>> crashk_cma_ranges[i].start, >>> +????????????????????????? crashk_cma_ranges[i].end); >>> +??????? if (ret) >>> +??????????? goto out; >>> +??? } >>> + >>> ????? /* >>> ?????? * FIXME: For now, stay in parity with kexec-tools but if >>> RTAS/OPAL >>> ?????? *??????? regions are exported to save their context at the >>> time of >>> -- >>> 2.51.0 > From ritesh.list at gmail.com Tue Nov 4 02:18:48 2025 From: ritesh.list at gmail.com (Ritesh Harjani (IBM)) Date: Tue, 04 Nov 2025 15:48:48 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> Message-ID: <87wm463xtj.ritesh.list@gmail.com> Sourabh Jain writes: > I would like to keep kdump_cma_reserve() as is it because of two reasons: > > - It keeps setup_arch() free from kdump #ifdefs Not really. Instead of kdump_cma_reserve(crashk_cma_size), one could call reserve_crashkernel_cma(crashk_cma_size) directly in setup_arch(). > - In case if we want to add some condition on this reservation it would > straight forward. > Make sense. > So lets keep kdump_cma_reserve as is, unless you have strong opinion on > not to. > No strong opinion, as I said it was a minor nit. Feel free to keep the function kdump_cma_reserve() as is then. -ritesh From sourabhjain at linux.ibm.com Tue Nov 4 02:35:42 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 16:05:42 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <87wm463xtj.ritesh.list@gmail.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> <87wm463xtj.ritesh.list@gmail.com> Message-ID: <722d72b5-cebf-48f2-8ad5-558ccd3c30f4@linux.ibm.com> On 04/11/25 15:48, Ritesh Harjani (IBM) wrote: > Sourabh Jain writes: > > >> I would like to keep kdump_cma_reserve() as is it because of two reasons: >> >> - It keeps setup_arch() free from kdump #ifdefs > Not really. > > Instead of kdump_cma_reserve(crashk_cma_size), one could call > > reserve_crashkernel_cma(crashk_cma_size) directly in setup_arch(). reserve_crashkernel_cma() is not available unless the kernel is built with CONFIG_CRASH_RESERVE. So, wouldn?t calling reserve_crashkernel_cma() directly from setup_arch() lead to a build failure? Or am I missing something? > >> - In case if we want to add some condition on this reservation it would >> straight forward. >> > Make sense. > >> So lets keep kdump_cma_reserve as is, unless you have strong opinion on >> not to. >> > No strong opinion, as I said it was a minor nit. Feel free to keep the > function kdump_cma_reserve() as is then. > > -ritesh > From ritesh.list at gmail.com Tue Nov 4 02:24:44 2025 From: ritesh.list at gmail.com (Ritesh Harjani (IBM)) Date: Tue, 04 Nov 2025 15:54:44 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> Message-ID: <87v7jq3xjn.ritesh.list@gmail.com> Sourabh Jain writes: > On 04/11/25 10:48, Sourabh Jain wrote: >> >> >> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote: >>> Sourabh Jain writes: >>> >>>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the >>>> crashkernel= command line option") and commit ab475510e042 ("kdump: >>>> implement reserve_crashkernel_cma") added CMA support for kdump >>>> crashkernel reservation. >>>> >>>> Extend crashkernel CMA reservation support to powerpc. >>>> >>>> The following changes are made to enable CMA reservation on powerpc: >>>> >>>> - Parse and obtain the CMA reservation size along with other >>>> crashkernel >>>> ?? parameters >>>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump >>>> - Include the CMA-reserved ranges in the usable memory ranges for the >>>> ?? kdump kernel to use. >>>> - Exclude the CMA-reserved ranges from the crash kernel memory to >>>> ?? prevent them from being exported through /proc/vmcore. >>>> >>>> With the introduction of the CMA crashkernel regions, >>>> crash_exclude_mem_range() needs to be called multiple times to exclude >>>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To >>>> avoid repetitive logic for validating mem_ranges size and handling >>>> reallocation when required, this functionality is moved to a new >>>> wrapper >>>> function crash_exclude_mem_range_guarded(). >>>> >>>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called >>>> after pageblock_order is initialized. >>>> >>>> Update kernel-parameters.txt to document CMA support for crashkernel on >>>> powerpc architecture. >>>> >>>> Cc: Baoquan he >>>> Cc: Jiri Bohac >>>> Cc: Hari Bathini >>>> Cc: Madhavan Srinivasan >>>> Cc: Mahesh Salgaonkar >>>> Cc: Michael Ellerman >>>> Cc: Ritesh Harjani (IBM) >>>> Cc: Shivang Upadhyay >>>> Cc: kexec at lists.infradead.org >>>> Signed-off-by: Sourabh Jain >>>> --- >>>> Changlog: >>>> >>>> v3 -> v4 >>>> ? - Removed repeated initialization to tmem in >>>> ??? crash_exclude_mem_range_guarded() >>>> ? - Call crash_exclude_mem_range() with right crashk ranges >>>> >>>> v4 -> v5: >>>> ? - Document CMA-based crashkernel support for ppc64 in >>>> kernel-parameters.txt >>>> --- >>>> ? .../admin-guide/kernel-parameters.txt???????? |? 2 +- >>>> ? arch/powerpc/include/asm/kexec.h????????????? |? 2 + >>>> ? arch/powerpc/kernel/setup-common.c??????????? |? 4 +- >>>> ? arch/powerpc/kexec/core.c???????????????????? | 10 ++++- >>>> ? arch/powerpc/kexec/ranges.c?????????????????? | 43 >>>> ++++++++++++++----- >>>> ? 5 files changed, 47 insertions(+), 14 deletions(-) >>>> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt >>>> b/Documentation/admin-guide/kernel-parameters.txt >>>> index 6c42061ca20e..0f386b546cec 100644 >>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>> @@ -1013,7 +1013,7 @@ >>>> ????????????? It will be ignored when crashkernel=X,high is not used >>>> ????????????? or memory reserved is below 4G. >>>> ????? crashkernel=size[KMG],cma >>>> -??????????? [KNL, X86] Reserve additional crash kernel memory from >>>> +??????????? [KNL, X86, ppc64] Reserve additional crash kernel >>>> memory from >>> Shouldn't this be PPC and not ppc64? >>> >>> If I see the crash_dump support... >>> >>> config ARCH_SUPPORTS_CRASH_DUMP >>> ????def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) >>> >>> The changes below aren't specific to ppc64 correct? >> >> The thing is this feature is only supported with KEXEC_FILE and which >> only supported on PPC64: >> >> config ARCH_SUPPORTS_KEXEC_FILE >> ??? def_bool PPC64 >> >> Hence I kept it as ppc64. >> I am not much familiar with the kexec_load v/s kexec_file_load internals. Maybe because of that I am unable to clearly understand your above points. But let me try and explain what I think you meant :) We first call "get_usable_memory_ranges(&umem)" which updates the usable memory ranges in "umem". We then call "update_usable_mem_fdt(fdt, umem)" which updates the FDT for the kdump kernel's fdt to inform about these usable memory ranges to the kdump kernel. Now since your patch only does that in get_usable_memory_range(), this extra CMA reservation is mainly only useful when the kdump load happens via kexec_file_load(), (because get_usable_memory_range() only gets called from kexec_file_load() path) Is this what you meant here? >> I think I should update that in the commit message. >> >> Also do you think is it good to restrict this feature to KEXEC_FILE? > > Putting this under KEXEC_FILE may not help much because KEXEC_FILE is > enabled > by default in most configurations. Once it is enabled, the CMA > reservation will > happen regardless of which system call is used to load the kdump kernel > (kexec_load or kexec_file_load). > What I understood from the feature was that, on the normal production kernel this feature crashkernel=xM,cma allows to reserve an extra xMB of memory as a CMA region for kdump kernel's memory allocations. But this CMA reservation would happen in the normal kernel itself during setup_arch() -> kdump_cma_reserve().. And this CMA reservation happens irrespective of whether the kdump kernel will get loaded via whichever system call. > However, not restricting this feature to KEXEC_FILE will allow the kexec > tool to > independently add support for this feature in the future for the kexec_load > system call. Sure. > > With that logic, I think if we do not restrict this feature to > KEXEC_FILE, the support > will be available for ppc and not limited to ppc64. > Yes, that make sense. If one doesn't want to make the CMA reservation, then we need not pass the extra cmdline argument and no reservation will be made. So, no need to restrict this to PPC64 by making it available only for KEXEC_FILE. -ritesh From ritesh.list at gmail.com Tue Nov 4 02:51:41 2025 From: ritesh.list at gmail.com (Ritesh Harjani (IBM)) Date: Tue, 04 Nov 2025 16:21:41 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <722d72b5-cebf-48f2-8ad5-558ccd3c30f4@linux.ibm.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> <87wm463xtj.ritesh.list@gmail.com> <722d72b5-cebf-48f2-8ad5-558ccd3c30f4@linux.ibm.com> Message-ID: <87tsza3waq.ritesh.list@gmail.com> Sourabh Jain writes: > On 04/11/25 15:48, Ritesh Harjani (IBM) wrote: >> Sourabh Jain writes: >> >> >>> I would like to keep kdump_cma_reserve() as is it because of two reasons: >>> >>> - It keeps setup_arch() free from kdump #ifdefs >> Not really. >> >> Instead of kdump_cma_reserve(crashk_cma_size), one could call >> >> reserve_crashkernel_cma(crashk_cma_size) directly in setup_arch(). > > > reserve_crashkernel_cma() is not available unless the kernel is built > with CONFIG_CRASH_RESERVE. > So, wouldn?t calling reserve_crashkernel_cma() directly from > setup_arch() lead to a build failure? Or > am I missing something? > OOps.. I was assuming the #else CRASHKERNEL_CMA definition should get called, but all of that logic itself is protected in CONFIG_CRASH_RESERVE :( Right to avoid #ifdef or IS_ENABLED in setup_arch.. it's better to have kdump_cma_reserve() Thanks for pointing that out. obj-$(CONFIG_CRASH_RESERVE) += crash_reserve.o kernel/crash_reserve.c #ifdef CRASHKERNEL_CMA int crashk_cma_cnt; void __init reserve_crashkernel_cma(unsigned long long cma_size) { ... } #else /* CRASHKERNEL_CMA */ void __init reserve_crashkernel_cma(unsigned long long cma_size) { if (cma_size) pr_warn("crashkernel CMA reservation not supported\n"); } #endif -ritesh >> >>> - In case if we want to add some condition on this reservation it would >>> straight forward. >>> >> Make sense. >> >>> So lets keep kdump_cma_reserve as is, unless you have strong opinion on >>> not to. >>> >> No strong opinion, as I said it was a minor nit. Feel free to keep the >> function kdump_cma_reserve() as is then. >> >> -ritesh >> From sourabhjain at linux.ibm.com Tue Nov 4 04:38:19 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 18:08:19 +0530 Subject: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation In-Reply-To: <87v7jq3xjn.ritesh.list@gmail.com> References: <20251103043747.1298065-1-sourabhjain@linux.ibm.com> <87y0on4ebh.ritesh.list@gmail.com> <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com> <87v7jq3xjn.ritesh.list@gmail.com> Message-ID: On 04/11/25 15:54, Ritesh Harjani (IBM) wrote: > Sourabh Jain writes: > >> On 04/11/25 10:48, Sourabh Jain wrote: >>> >>> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote: >>>> Sourabh Jain writes: >>>> >>>>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the >>>>> crashkernel= command line option") and commit ab475510e042 ("kdump: >>>>> implement reserve_crashkernel_cma") added CMA support for kdump >>>>> crashkernel reservation. >>>>> >>>>> Extend crashkernel CMA reservation support to powerpc. >>>>> >>>>> The following changes are made to enable CMA reservation on powerpc: >>>>> >>>>> - Parse and obtain the CMA reservation size along with other >>>>> crashkernel >>>>> ?? parameters >>>>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump >>>>> - Include the CMA-reserved ranges in the usable memory ranges for the >>>>> ?? kdump kernel to use. >>>>> - Exclude the CMA-reserved ranges from the crash kernel memory to >>>>> ?? prevent them from being exported through /proc/vmcore. >>>>> >>>>> With the introduction of the CMA crashkernel regions, >>>>> crash_exclude_mem_range() needs to be called multiple times to exclude >>>>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To >>>>> avoid repetitive logic for validating mem_ranges size and handling >>>>> reallocation when required, this functionality is moved to a new >>>>> wrapper >>>>> function crash_exclude_mem_range_guarded(). >>>>> >>>>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called >>>>> after pageblock_order is initialized. >>>>> >>>>> Update kernel-parameters.txt to document CMA support for crashkernel on >>>>> powerpc architecture. >>>>> >>>>> Cc: Baoquan he >>>>> Cc: Jiri Bohac >>>>> Cc: Hari Bathini >>>>> Cc: Madhavan Srinivasan >>>>> Cc: Mahesh Salgaonkar >>>>> Cc: Michael Ellerman >>>>> Cc: Ritesh Harjani (IBM) >>>>> Cc: Shivang Upadhyay >>>>> Cc: kexec at lists.infradead.org >>>>> Signed-off-by: Sourabh Jain >>>>> --- >>>>> Changlog: >>>>> >>>>> v3 -> v4 >>>>> ? - Removed repeated initialization to tmem in >>>>> ??? crash_exclude_mem_range_guarded() >>>>> ? - Call crash_exclude_mem_range() with right crashk ranges >>>>> >>>>> v4 -> v5: >>>>> ? - Document CMA-based crashkernel support for ppc64 in >>>>> kernel-parameters.txt >>>>> --- >>>>> ? .../admin-guide/kernel-parameters.txt???????? |? 2 +- >>>>> ? arch/powerpc/include/asm/kexec.h????????????? |? 2 + >>>>> ? arch/powerpc/kernel/setup-common.c??????????? |? 4 +- >>>>> ? arch/powerpc/kexec/core.c???????????????????? | 10 ++++- >>>>> ? arch/powerpc/kexec/ranges.c?????????????????? | 43 >>>>> ++++++++++++++----- >>>>> ? 5 files changed, 47 insertions(+), 14 deletions(-) >>>>> >>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt >>>>> b/Documentation/admin-guide/kernel-parameters.txt >>>>> index 6c42061ca20e..0f386b546cec 100644 >>>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>>> @@ -1013,7 +1013,7 @@ >>>>> ????????????? It will be ignored when crashkernel=X,high is not used >>>>> ????????????? or memory reserved is below 4G. >>>>> ????? crashkernel=size[KMG],cma >>>>> -??????????? [KNL, X86] Reserve additional crash kernel memory from >>>>> +??????????? [KNL, X86, ppc64] Reserve additional crash kernel >>>>> memory from >>>> Shouldn't this be PPC and not ppc64? >>>> >>>> If I see the crash_dump support... >>>> >>>> config ARCH_SUPPORTS_CRASH_DUMP >>>> ????def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) >>>> >>>> The changes below aren't specific to ppc64 correct? >>> The thing is this feature is only supported with KEXEC_FILE and which >>> only supported on PPC64: >>> >>> config ARCH_SUPPORTS_KEXEC_FILE >>> ??? def_bool PPC64 >>> >>> Hence I kept it as ppc64. >>> > I am not much familiar with the kexec_load v/s kexec_file_load > internals. Maybe because of that I am unable to clearly understand your > above points. > > But let me try and explain what I think you meant :) > > We first call "get_usable_memory_ranges(&umem)" which updates the usable > memory ranges in "umem". We then call "update_usable_mem_fdt(fdt, umem)" > which updates the FDT for the kdump kernel's fdt to inform about these > usable memory ranges to the kdump kernel. > > Now since your patch only does that in get_usable_memory_range(), this > extra CMA reservation is mainly only useful when the kdump load happens > via kexec_file_load(), (because get_usable_memory_range() only gets > called from kexec_file_load() path) > > Is this what you meant here? Yeah, for kexec_file_load, the FDT for the kdump kernel is prepared in the Linux kernel (using the functions you mentioned), whereas for kexec_load, it is prepared in the kexec tool (userspace). Hence, these changes are not sufficient to support this feature with the kexec_load syscall. The kexec tool must be updated to ensure that the FDT is prepared in a way that marks the crashkernel CMA reservation as usable in the kdump FDT for the kexec_load system call. Anyway, it makes more sense to say that crashkernel=xM,cma support is available on ppc rather than ppc64, since restricting crashkernel CMA reservation to KEXEC_FILE does not help. The details are explained below. > > >>> I think I should update that in the commit message. >>> >>> Also do you think is it good to restrict this feature to KEXEC_FILE? >> Putting this under KEXEC_FILE may not help much because KEXEC_FILE is >> enabled >> by default in most configurations. Once it is enabled, the CMA >> reservation will >> happen regardless of which system call is used to load the kdump kernel >> (kexec_load or kexec_file_load). >> > What I understood from the feature was that, on the normal production > kernel this feature crashkernel=xM,cma allows to reserve an extra xMB of > memory as a CMA region for kdump kernel's memory allocations. But this > CMA reservation would happen in the normal kernel itself during > setup_arch() -> kdump_cma_reserve().. > > And this CMA reservation happens irrespective of whether the kdump > kernel will get loaded via whichever system call. Yeah that's right. > >> However, not restricting this feature to KEXEC_FILE will allow the kexec >> tool to >> independently add support for this feature in the future for the kexec_load >> system call. > Sure. > >> With that logic, I think if we do not restrict this feature to >> KEXEC_FILE, the support >> will be available for ppc and not limited to ppc64. >> > Yes, that make sense. > > If one doesn't want to make the CMA reservation, then we need not pass > the extra cmdline argument and no reservation will be made. So, no need > to restrict this to PPC64 by making it available only for KEXEC_FILE. Agree. From sourabhjain at linux.ibm.com Tue Nov 4 05:28:18 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Tue, 4 Nov 2025 18:58:18 +0530 Subject: [PATCH v6] powerpc/kdump: Add support for crashkernel CMA reservation Message-ID: <20251104132818.1724562-1-sourabhjain@linux.ibm.com> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab475510e042 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Extend crashkernel CMA reservation support to powerpc. The following changes are made to enable CMA reservation on powerpc: - Parse and obtain the CMA reservation size along with other crashkernel parameters - Call reserve_crashkernel_cma() to allocate the CMA region for kdump - Include the CMA-reserved ranges in the usable memory ranges for the kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore. With the introduction of the CMA crashkernel regions, crash_exclude_mem_range() needs to be called multiple times to exclude both crashk_res and crashk_cma_ranges from the crash memory ranges. To avoid repetitive logic for validating mem_ranges size and handling reallocation when required, this functionality is moved to a new wrapper function crash_exclude_mem_range_guarded(). To ensure proper CMA reservation, reserve_crashkernel_cma() is called after pageblock_order is initialized. Update kernel-parameters.txt to document CMA support for crashkernel on powerpc architecture. Cc: Baoquan he Cc: Jiri Bohac Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- v3 -> v4 - Removed repeated initialization to tmem in crash_exclude_mem_range_guarded() - Call crash_exclude_mem_range() with right crashk ranges v4 -> v5: - Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt v5 -> v6 - Change variable name, cma_size -> crashk_cma_size - Update support for this feature to ppc instead of ppc64 --- .../admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/kexec.h | 2 + arch/powerpc/kernel/setup-common.c | 4 +- arch/powerpc/kexec/core.c | 10 ++++- arch/powerpc/kexec/ranges.c | 43 ++++++++++++++----- 5 files changed, 47 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6c42061ca20e..1c10190d583d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1013,7 +1013,7 @@ It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. crashkernel=size[KMG],cma - [KNL, X86] Reserve additional crash kernel memory from + [KNL, X86, ppc] Reserve additional crash kernel memory from CMA. This reservation is usable by the first system's userspace memory and kernel movable allocations (memory balloon, zswap). Pages allocated from this memory range diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4bbf9f699aaa..bd4a6c42a5f3 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); extern void arch_reserve_crashkernel(void); +extern void kdump_cma_reserve(void); #else static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } +static inline void kdump_cma_reserve(void) { } #endif #if defined(CONFIG_CRASH_DUMP) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 68d47c53876c..c8c42b419742 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) initmem_init(); /* - * Reserve large chunks of memory for use by CMA for fadump, KVM and + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and * hugetlb. These must be called after initmem_init(), so that * pageblock_order is initialised. */ fadump_cma_init(); + kdump_cma_reserve(); kvm_cma_reserve(); gigantic_hugetlb_cma_reserve(); diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index d1a2d755381c..d0b8d6300f84 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -59,6 +59,8 @@ void machine_kexec(struct kimage *image) #ifdef CONFIG_CRASH_RESERVE +unsigned long long crashk_cma_size; + static unsigned long long __init get_crash_base(unsigned long long crash_base) { @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) /* use common parsing */ ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, - &crash_base, NULL, NULL, NULL); + &crash_base, NULL, &crashk_cma_size, NULL); if (ret) return; @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) reserve_crashkernel_generic(crash_size, crash_base, 0, false); } +void __init kdump_cma_reserve(void) +{ + if (crashk_cma_size) + reserve_crashkernel_cma(crashk_cma_size); +} + int __init overlaps_crashkernel(unsigned long start, unsigned long size) { return (start + size) > crashk_res.start && start <= crashk_res.end; diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c index 3702b0bdab14..3bd27c38726b 100644 --- a/arch/powerpc/kexec/ranges.c +++ b/arch/powerpc/kexec/ranges.c @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges) */ int get_usable_memory_ranges(struct crash_mem **mem_ranges) { - int ret; + int ret, i; /* * Early boot failure observed on guests when low memory (first memory @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; i++) { + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1); + if (ret) + goto out; + } + ret = add_rtas_mem_range(mem_ranges); if (ret) goto out; @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) #endif /* CONFIG_KEXEC_FILE */ #ifdef CONFIG_CRASH_DUMP +static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges, + unsigned long long mstart, + unsigned long long mend) +{ + struct crash_mem *tmem = *mem_ranges; + + /* Reallocate memory ranges if there is no space to split ranges */ + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + return -ENOMEM; + } + + return crash_exclude_mem_range(tmem, mstart, mend); +} + /** * get_crash_memory_ranges - Get crash memory ranges. This list includes * first/crashing kernel's memory regions that @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) int get_crash_memory_ranges(struct crash_mem **mem_ranges) { phys_addr_t base, end; - struct crash_mem *tmem; u64 i; int ret; @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges) sort_memory_ranges(*mem_ranges, true); } - /* Reallocate memory ranges if there is no space to split ranges */ - tmem = *mem_ranges; - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { - tmem = realloc_mem_ranges(mem_ranges); - if (!tmem) - goto out; - } - /* Exclude crashkernel region */ - ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end); if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; ++i) { + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end); + if (ret) + goto out; + } + /* * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL * regions are exported to save their context at the time of -- 2.51.0 From rppt at kernel.org Tue Nov 4 06:31:54 2025 From: rppt at kernel.org (Mike Rapoport) Date: Tue, 4 Nov 2025 16:31:54 +0200 Subject: [PATCH 1/2] kho: fix unpreservation of higher-order vmalloc preservations In-Reply-To: <20251103180235.71409-2-pratyush@kernel.org> References: <20251103180235.71409-1-pratyush@kernel.org> <20251103180235.71409-2-pratyush@kernel.org> Message-ID: On Mon, Nov 03, 2025 at 07:02:31PM +0100, Pratyush Yadav wrote: > kho_vmalloc_unpreserve_chunk() calls __kho_unpreserve() with end_pfn as > pfn + 1. This happens to work for 0-order pages, but leaks higher order > pages. > > For example, say order 2 pages back the allocation. During preservation, > they get preserved in the order 2 bitmaps, but > kho_vmalloc_unpreserve_chunk() would try to unpreserve them from the > order 0 bitmaps, which should not have these bits set anyway, leaving > the order 2 bitmaps untouched. This results in the pages being carried > over to the next kernel. Nothing will free those pages in the next boot, > leaking them. > > Fix this by taking the order into account when calculating the end PFN > for __kho_unpreserve(). > > Fixes: a667300bd53f2 ("kho: add support for preserving vmalloc allocations") > Signed-off-by: Pratyush Yadav Reviewed-by: Mike Rapoport (Microsoft) > --- > > Notes: > When Pasha's patch [0] to add kho_unpreserve_pages() is merged, maybe it > would be a better idea to use kho_unpreserve_pages() here? But that is > something for later I suppose. > > [0] https://lore.kernel.org/linux-mm/20251101142325.1326536-4-pasha.tatashin at soleen.com/ > > kernel/kexec_handover.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c > index cc5aaa738bc50..c2bcbb10918ce 100644 > --- a/kernel/kexec_handover.c > +++ b/kernel/kexec_handover.c > @@ -862,7 +862,8 @@ static struct kho_vmalloc_chunk *new_vmalloc_chunk(struct kho_vmalloc_chunk *cur > return NULL; > } > > -static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) > +static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk, > + unsigned short order) > { > struct kho_mem_track *track = &kho_out.ser.track; > unsigned long pfn = PHYS_PFN(virt_to_phys(chunk)); > @@ -871,7 +872,7 @@ static void kho_vmalloc_unpreserve_chunk(struct kho_vmalloc_chunk *chunk) > > for (int i = 0; i < ARRAY_SIZE(chunk->phys) && chunk->phys[i]; i++) { > pfn = PHYS_PFN(chunk->phys[i]); > - __kho_unpreserve(track, pfn, pfn + 1); > + __kho_unpreserve(track, pfn, pfn + (1 << order)); > } > } > > @@ -882,7 +883,7 @@ static void kho_vmalloc_free_chunks(struct kho_vmalloc *kho_vmalloc) > while (chunk) { > struct kho_vmalloc_chunk *tmp = chunk; > > - kho_vmalloc_unpreserve_chunk(chunk); > + kho_vmalloc_unpreserve_chunk(chunk, kho_vmalloc->order); > > chunk = KHOSER_LOAD_PTR(chunk->hdr.next); > free_page((unsigned long)tmp); > -- > 2.47.3 > -- Sincerely yours, Mike. From rppt at kernel.org Tue Nov 4 06:32:42 2025 From: rppt at kernel.org (Mike Rapoport) Date: Tue, 4 Nov 2025 16:32:42 +0200 Subject: [PATCH 2/2] kho: warn and exit when unpreserved page wasn't preserved In-Reply-To: <20251103180235.71409-3-pratyush@kernel.org> References: <20251103180235.71409-1-pratyush@kernel.org> <20251103180235.71409-3-pratyush@kernel.org> Message-ID: On Mon, Nov 03, 2025 at 07:02:32PM +0100, Pratyush Yadav wrote: > Calling __kho_unpreserve() on a pair of (pfn, end_pfn) that wasn't > preserved is a bug. Currently, if that is done, the physxa or bits can > be NULL. This results in a soft lockup since a NULL physxa or bits > results in redoing the loop without ever making any progress. > > Return when physxa or bits are not found, but WARN first to loudly > indicate invalid behaviour. > > Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation") > Signed-off-by: Pratyush Yadav Reviewed-by: Mike Rapoport (Microsoft) > --- > kernel/kexec_handover.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c > index c2bcbb10918ce..e5fd833726226 100644 > --- a/kernel/kexec_handover.c > +++ b/kernel/kexec_handover.c > @@ -167,12 +167,12 @@ static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pfn, > const unsigned long pfn_high = pfn >> order; > > physxa = xa_load(&track->orders, order); > - if (!physxa) > - continue; > + if (WARN_ON_ONCE(!physxa)) > + return; > > bits = xa_load(&physxa->phys_bits, pfn_high / PRESERVE_BITS); > - if (!bits) > - continue; > + if (WARN_ON_ONCE(!bits)) > + return; > > clear_bit(pfn_high % PRESERVE_BITS, bits->preserve); > > -- > 2.47.3 > -- Sincerely yours, Mike. From lbulwahn at redhat.com Tue Nov 4 06:32:38 2025 From: lbulwahn at redhat.com (Lukas Bulwahn) Date: Tue, 4 Nov 2025 15:32:38 +0100 Subject: [PATCH] MAINTAINERS: extend file entry in KHO to include subdirectories Message-ID: <20251104143238.119803-1-lukas.bulwahn@redhat.com> From: Lukas Bulwahn Commit 3498209ff64e ("Documentation: add documentation for KHO") adds the file entry for 'Documentation/core-api/kho/*'. The asterisk in the end means that all files in kho are included, but not files in its subdirectories below. Hence, the files under Documentation/core-api/kho/bindings/ are not considered part of KHO, and get_maintainers.pl does not necessarily add the KHO maintainers to the recipients of patches to those files. Probably, this is not intended, though, and it was simply an oversight of the detailed semantics of such file entries. Make the file entry to include the subdirectories of Documentation/core-api/kho/. Signed-off-by: Lukas Bulwahn --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 06ff926c5331..499b52d7793f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13836,7 +13836,7 @@ L: kexec at lists.infradead.org L: linux-mm at kvack.org S: Maintained F: Documentation/admin-guide/mm/kho.rst -F: Documentation/core-api/kho/* +F: Documentation/core-api/kho/ F: include/linux/kexec_handover.h F: kernel/kexec_handover.c F: tools/testing/selftests/kho/ -- 2.51.1 From helgaas at kernel.org Tue Nov 4 07:19:36 2025 From: helgaas at kernel.org (Bjorn Helgaas) Date: Tue, 4 Nov 2025 09:19:36 -0600 Subject: [PATCH] MAINTAINERS: extend file entry in KHO to include subdirectories In-Reply-To: <20251104143238.119803-1-lukas.bulwahn@redhat.com> Message-ID: <20251104151936.GA1857569@bhelgaas> On Tue, Nov 04, 2025 at 03:32:38PM +0100, Lukas Bulwahn wrote: > From: Lukas Bulwahn > > Commit 3498209ff64e ("Documentation: add documentation for KHO") adds the > file entry for 'Documentation/core-api/kho/*'. The asterisk in the end > means that all files in kho are included, but not files in its > subdirectories below. Add blank line between paragraphs as you did below. > Hence, the files under Documentation/core-api/kho/bindings/ are not > considered part of KHO, and get_maintainers.pl does not necessarily add the > KHO maintainers to the recipients of patches to those files. Probably, this > is not intended, though, and it was simply an oversight of the detailed > semantics of such file entries. > > Make the file entry to include the subdirectories of > Documentation/core-api/kho/. > > Signed-off-by: Lukas Bulwahn > --- > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 06ff926c5331..499b52d7793f 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -13836,7 +13836,7 @@ L: kexec at lists.infradead.org > L: linux-mm at kvack.org > S: Maintained > F: Documentation/admin-guide/mm/kho.rst > -F: Documentation/core-api/kho/* > +F: Documentation/core-api/kho/ > F: include/linux/kexec_handover.h > F: kernel/kexec_handover.c > F: tools/testing/selftests/kho/ > -- > 2.51.1 > From bhe at redhat.com Tue Nov 4 19:01:02 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 11:01:02 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: <20251103063440.1681657-4-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> Message-ID: On 11/03/25 at 02:34pm, Qiang Ma wrote: > The commit a85ee18c7900 ("kexec_file: print out debugging message > if required") has added general code printing in kexec_file_load(), > but not in kexec_load(). > > Especially in the RISC-V architecture, kexec_image_info() has been > removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging > message if required")). As a result, when using '-d' for the kexec_load > interface, print nothing in the kernel space. This might be helpful for > verifying the accuracy of the data passed to the kernel. Therefore, > refer to this commit a85ee18c7900 ("kexec_file: print out debugging > message if required"), debug print information has been added. > > Signed-off-by: Qiang Ma > Reported-by: kernel test robot > Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ > --- > kernel/kexec.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index c7a869d32f87..9b433b972cc1 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > if (ret) > goto out; > > + kexec_dprintk("nr_segments = %lu\n", nr_segments); > for (i = 0; i < nr_segments; i++) { > + struct kexec_segment *ksegment; > + > + ksegment = &image->segment[i]; > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > + i, ksegment->buf, ksegment->bufsz, ksegment->mem, > + ksegment->memsz); There has already been a print_segments() in kexec-tools/kexec/kexec.c, you will get duplicated printing. That sounds not good. Have you tested this? > + > ret = kimage_load_segment(image, i); > if (ret) > goto out; > @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > if (ret) > goto out; > > + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", > + image->type, image->start, image->head, flags); > + > /* Install the new kernel and uninstall the old */ > image = xchg(dest_image, image); > > -- > 2.20.1 > From bhe at redhat.com Tue Nov 4 19:05:44 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 11:05:44 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: <20251103063440.1681657-5-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-5-maqianga@uniontech.com> Message-ID: On 11/03/25 at 02:34pm, Qiang Ma wrote: > The type of the struct kimage member variable nr_segments is unsigned long. > Correct the loop variable i and the print format specifier type. I can't see what's meaningful with this change. nr_segments is unsigned long, but it's the range 'i' will loop. If so, we need change all for loop of the int iterator. > > Signed-off-by: Qiang Ma > --- > kernel/kexec_file.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 4a24aadbad02..7afdaa0efc50 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > int image_type = (flags & KEXEC_FILE_ON_CRASH) ? > KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; > struct kimage **dest_image, *image; > - int ret = 0, i; > + int ret = 0; > + unsigned long i; > > /* We only trust the superuser with rebooting the system. */ > if (!kexec_load_permitted(image_type)) > @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > struct kexec_segment *ksegment; > > ksegment = &image->segment[i]; > - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > i, ksegment->buf, ksegment->bufsz, ksegment->mem, > ksegment->memsz); > > -- > 2.20.1 > From bhe at redhat.com Tue Nov 4 19:09:13 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 11:09:13 +0800 Subject: [PATCH v2 2/4] kexec: add kexec_core flag to control debug printing In-Reply-To: <20251103063440.1681657-3-maqianga@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-3-maqianga@uniontech.com> Message-ID: On 11/03/25 at 02:34pm, Qiang Ma wrote: > The commit a85ee18c7900 ("kexec_file: print out debugging message > if required") has added general code printing in kexec_file_load(), > but not in kexec_load(). > > Since kexec_load and kexec_file_load are not triggered > simultaneously, we can unify the debug flag of kexec and kexec_file > as kexec_core_dbg_print. After reconsidering this, I regret calling it kexec_core_dbg_print. That sounds a printing only happening in kexec_core. Maybe kexec_dbg_print is better. Because here kexec refers to a generic concept, but not limited to kexec_load interface only. Just my personal thinking. Other than the naming, the whole patch looks good to me. Thanks. > > Next, we need to do four things: > > 1. rename kexec_file_dbg_print to kexec_core_dbg_print > 2. Add KEXEC_DEBUG > 3. Initialize kexec_core_dbg_print for kexec > 4. Set the reset of kexec_file_dbg_print to kimage_free > > Signed-off-by: Qiang Ma > --- > include/linux/kexec.h | 9 +++++---- > include/uapi/linux/kexec.h | 1 + > kernel/kexec.c | 1 + > kernel/kexec_core.c | 4 +++- > kernel/kexec_file.c | 4 +--- > 5 files changed, 11 insertions(+), 8 deletions(-) > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index ff7e231b0485..cad8b5c362af 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -455,10 +455,11 @@ bool kexec_load_permitted(int kexec_image_type); > > /* List of defined/legal kexec flags */ > #ifndef CONFIG_KEXEC_JUMP > -#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT) > +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT | \ > + KEXEC_DEBUG) > #else > #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR | \ > - KEXEC_CRASH_HOTPLUG_SUPPORT) > + KEXEC_CRASH_HOTPLUG_SUPPORT | KEXEC_DEBUG) > #endif > > /* List of defined/legal kexec file flags */ > @@ -525,10 +526,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g > static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { } > #endif > > -extern bool kexec_file_dbg_print; > +extern bool kexec_core_dbg_print; > > #define kexec_dprintk(fmt, arg...) \ > - do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) > + do { if (kexec_core_dbg_print) pr_info(fmt, ##arg); } while (0) > > extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size); > extern void kimage_unmap_segment(void *buffer); > diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h > index 55749cb0b81d..819c600af125 100644 > --- a/include/uapi/linux/kexec.h > +++ b/include/uapi/linux/kexec.h > @@ -14,6 +14,7 @@ > #define KEXEC_PRESERVE_CONTEXT 0x00000002 > #define KEXEC_UPDATE_ELFCOREHDR 0x00000004 > #define KEXEC_CRASH_HOTPLUG_SUPPORT 0x00000008 > +#define KEXEC_DEBUG 0x00000010 > #define KEXEC_ARCH_MASK 0xffff0000 > > /* > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 9bb1f2b6b268..c7a869d32f87 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -42,6 +42,7 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, > if (!image) > return -ENOMEM; > > + kexec_core_dbg_print = !!(flags & KEXEC_DEBUG); > image->start = entry; > image->nr_segments = nr_segments; > memcpy(image->segment, segments, nr_segments * sizeof(*segments)); > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index fa00b239c5d9..865f2b14f23b 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -53,7 +53,7 @@ atomic_t __kexec_lock = ATOMIC_INIT(0); > /* Flag to indicate we are going to kexec a new kernel */ > bool kexec_in_progress = false; > > -bool kexec_file_dbg_print; > +bool kexec_core_dbg_print; > > /* > * When kexec transitions to the new kernel there is a one-to-one > @@ -576,6 +576,8 @@ void kimage_free(struct kimage *image) > kimage_entry_t *ptr, entry; > kimage_entry_t ind = 0; > > + kexec_core_dbg_print = false; > + > if (!image) > return; > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index eb62a9794242..4a24aadbad02 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -138,8 +138,6 @@ void kimage_file_post_load_cleanup(struct kimage *image) > */ > kfree(image->image_loader_data); > image->image_loader_data = NULL; > - > - kexec_file_dbg_print = false; > } > > #ifdef CONFIG_KEXEC_SIG > @@ -314,7 +312,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, > if (!image) > return -ENOMEM; > > - kexec_file_dbg_print = !!(flags & KEXEC_FILE_DEBUG); > + kexec_core_dbg_print = !!(flags & KEXEC_FILE_DEBUG); > image->file_mode = 1; > > #ifdef CONFIG_CRASH_DUMP > -- > 2.20.1 > From bhe at redhat.com Tue Nov 4 19:15:57 2025 From: bhe at redhat.com (Baoquan he) Date: Wed, 5 Nov 2025 11:15:57 +0800 Subject: [PATCH 0/2] Export kdump crashkernel CMA ranges In-Reply-To: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> References: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> Message-ID: On 11/03/25 at 09:28am, Sourabh Jain wrote: > /sys/kernel/kexec_crash_cma_ranges to export all CMA regions reserved > for the crashkernel to user-space. This enables user-space tools > configuring kdump to determine the amount of memory reserved for the > crashkernel. When CMA is used for crashkernel allocation, tools can use > this information to warn users that attempting to capture user pages > while CMA reservation is active may lead to unreliable or incomplete > dump capture. > > While adding documentation for the new sysfs interface, I realized that > there was no ABI document for the existing kexec and kdump sysfs > interfaces, so I added one. > > The first patch adds the ABI documentation for the existing kexec and > kdump sysfs interfaces, and the second patch adds the > /sys/kernel/kexec_crash_cma_ranges sysfs interface along with its > corresponding ABI documentation. > > *Seeking opinions* > There are already four kexec/kdump sysfs entries under /sys/kernel/, > and this patch series adds one more. Should we consider moving them to > a separate directory, such as /sys/kernel/kexec, to avoid polluting > /sys/kernel/? For backward compatibility, we can create symlinks at > the old locations for sometime and remove them in the future. That sounds a good idea, will you do it in v2? Because otherwise the kexec_crash_cma_ranges need be moved too. > > Cc: Andrew Morton > Cc: Baoquan he > Cc: Jiri Bohac > Cc: Shivang Upadhyay > Cc: linuxppc-dev at lists.ozlabs.org > Cc: kexec at lists.infradead.org > > Sourabh Jain (2): > Documentation/ABI: add kexec and kdump sysfs interface > crash: export crashkernel CMA reservation to userspace > > .../ABI/testing/sysfs-kernel-kexec-kdump | 53 +++++++++++++++++++ > kernel/ksysfs.c | 17 ++++++ > 2 files changed, 70 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump > > -- > 2.51.0 > From sourabhjain at linux.ibm.com Tue Nov 4 19:33:43 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Wed, 5 Nov 2025 09:03:43 +0530 Subject: [PATCH 0/2] Export kdump crashkernel CMA ranges In-Reply-To: References: <20251103035859.1267318-1-sourabhjain@linux.ibm.com> Message-ID: On 05/11/25 08:45, Baoquan he wrote: > On 11/03/25 at 09:28am, Sourabh Jain wrote: >> /sys/kernel/kexec_crash_cma_ranges to export all CMA regions reserved >> for the crashkernel to user-space. This enables user-space tools >> configuring kdump to determine the amount of memory reserved for the >> crashkernel. When CMA is used for crashkernel allocation, tools can use >> this information to warn users that attempting to capture user pages >> while CMA reservation is active may lead to unreliable or incomplete >> dump capture. >> >> While adding documentation for the new sysfs interface, I realized that >> there was no ABI document for the existing kexec and kdump sysfs >> interfaces, so I added one. >> >> The first patch adds the ABI documentation for the existing kexec and >> kdump sysfs interfaces, and the second patch adds the >> /sys/kernel/kexec_crash_cma_ranges sysfs interface along with its >> corresponding ABI documentation. >> >> *Seeking opinions* >> There are already four kexec/kdump sysfs entries under /sys/kernel/, >> and this patch series adds one more. Should we consider moving them to >> a separate directory, such as /sys/kernel/kexec, to avoid polluting >> /sys/kernel/? For backward compatibility, we can create symlinks at >> the old locations for sometime and remove them in the future. > That sounds a good idea, will you do it in v2? Because otherwise the > kexec_crash_cma_ranges need be moved too. Yes I will include it in v2. Thanks, Sourabh Jain From maqianga at uniontech.com Tue Nov 4 19:41:09 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 11:41:09 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> Message-ID: <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> ? 2025/11/5 11:01, Baoquan He ??: > On 11/03/25 at 02:34pm, Qiang Ma wrote: >> The commit a85ee18c7900 ("kexec_file: print out debugging message >> if required") has added general code printing in kexec_file_load(), >> but not in kexec_load(). >> >> Especially in the RISC-V architecture, kexec_image_info() has been >> removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging >> message if required")). As a result, when using '-d' for the kexec_load >> interface, print nothing in the kernel space. This might be helpful for >> verifying the accuracy of the data passed to the kernel. Therefore, >> refer to this commit a85ee18c7900 ("kexec_file: print out debugging >> message if required"), debug print information has been added. >> >> Signed-off-by: Qiang Ma >> Reported-by: kernel test robot >> Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ >> --- >> kernel/kexec.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/kernel/kexec.c b/kernel/kexec.c >> index c7a869d32f87..9b433b972cc1 100644 >> --- a/kernel/kexec.c >> +++ b/kernel/kexec.c >> @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >> if (ret) >> goto out; >> >> + kexec_dprintk("nr_segments = %lu\n", nr_segments); >> for (i = 0; i < nr_segments; i++) { >> + struct kexec_segment *ksegment; >> + >> + ksegment = &image->segment[i]; >> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >> + i, ksegment->buf, ksegment->bufsz, ksegment->mem, >> + ksegment->memsz); > There has already been a print_segments() in kexec-tools/kexec/kexec.c, > you will get duplicated printing. That sounds not good. Have you tested > this? I have tested it, kexec-tools is the debug message printed in user space, while kexec_dprintk is printed in kernel space. This might be helpful for verifying the accuracy of the data passed to the kernel. >> + >> ret = kimage_load_segment(image, i); >> if (ret) >> goto out; >> @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >> if (ret) >> goto out; >> >> + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", >> + image->type, image->start, image->head, flags); >> + >> /* Install the new kernel and uninstall the old */ >> image = xchg(dest_image, image); >> >> -- >> 2.20.1 >> > From maqianga at uniontech.com Tue Nov 4 19:47:44 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 11:47:44 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-5-maqianga@uniontech.com> Message-ID: <0C92443D3E2100AF+c669d240-1ee8-4897-a30d-3efefe161085@uniontech.com> ? 2025/11/5 11:05, Baoquan He ??: > On 11/03/25 at 02:34pm, Qiang Ma wrote: >> The type of the struct kimage member variable nr_segments is unsigned long. >> Correct the loop variable i and the print format specifier type. > I can't see what's meaningful with this change. nr_segments is unsigned > long, but it's the range 'i' will loop. If so, we need change all for > loop of the int iterator. If image->nr_segments is large enough, 'i' overflow causes an infinite loop. >> Signed-off-by: Qiang Ma >> --- >> kernel/kexec_file.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c >> index 4a24aadbad02..7afdaa0efc50 100644 >> --- a/kernel/kexec_file.c >> +++ b/kernel/kexec_file.c >> @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, >> int image_type = (flags & KEXEC_FILE_ON_CRASH) ? >> KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; >> struct kimage **dest_image, *image; >> - int ret = 0, i; >> + int ret = 0; >> + unsigned long i; >> >> /* We only trust the superuser with rebooting the system. */ >> if (!kexec_load_permitted(image_type)) >> @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, >> struct kexec_segment *ksegment; >> >> ksegment = &image->segment[i]; >> - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >> i, ksegment->buf, ksegment->bufsz, ksegment->mem, >> ksegment->memsz); >> >> -- >> 2.20.1 >> > From maqianga at uniontech.com Tue Nov 4 20:31:00 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 12:31:00 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-5-maqianga@uniontech.com>

Message-ID: ? 2025/11/5 11:47, Qiang Ma ??: > > ? 2025/11/5 11:05, Baoquan He ??: >> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>> The type of the struct kimage member variable nr_segments is >>> unsigned long. >>> Correct the loop variable i and the print format specifier type. >> I can't see what's meaningful with this change. nr_segments is unsigned >> long, but it's the range 'i' will loop. If so, we need change all for >> loop of the int iterator. > If image->nr_segments is large enough, 'i' overflow causes an infinite > loop. Meanwhile, the do_kexec_load() was checked and also defined as 'unsigned long i'. >>> Signed-off-by: Qiang Ma >>> --- >>> ? kernel/kexec_file.c | 5 +++-- >>> ? 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c >>> index 4a24aadbad02..7afdaa0efc50 100644 >>> --- a/kernel/kexec_file.c >>> +++ b/kernel/kexec_file.c >>> @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, >>> int, initrd_fd, >>> ????? int image_type = (flags & KEXEC_FILE_ON_CRASH) ? >>> ?????????????? KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; >>> ????? struct kimage **dest_image, *image; >>> -??? int ret = 0, i; >>> +??? int ret = 0; >>> +??? unsigned long i; >>> ? ????? /* We only trust the superuser with rebooting the system. */ >>> ????? if (!kexec_load_permitted(image_type)) >>> @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, >>> int, initrd_fd, >>> ????????? struct kexec_segment *ksegment; >>> ? ????????? ksegment = &image->segment[i]; >>> -??????? kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx >>> memsz=0x%zx\n", >>> +??????? kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx >>> memsz=0x%zx\n", >>> ??????????????????? i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>> ??????????????????? ksegment->memsz); >>> ? -- >>> 2.20.1 >>> >> From maqianga at uniontech.com Tue Nov 4 20:32:42 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 12:32:42 +0800 Subject: [PATCH v2 2/4] kexec: add kexec_core flag to control debug printing In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-3-maqianga@uniontech.com> Message-ID: ? 2025/11/5 11:09, Baoquan He ??: > On 11/03/25 at 02:34pm, Qiang Ma wrote: >> The commit a85ee18c7900 ("kexec_file: print out debugging message >> if required") has added general code printing in kexec_file_load(), >> but not in kexec_load(). >> >> Since kexec_load and kexec_file_load are not triggered >> simultaneously, we can unify the debug flag of kexec and kexec_file >> as kexec_core_dbg_print. > After reconsidering this, I regret calling it kexec_core_dbg_print. > That sounds a printing only happening in kexec_core. Maybe > kexec_dbg_print is better. Because here kexec refers to a generic > concept, but not limited to kexec_load interface only. Just my personal > thinking. This sounds reasonable. The next version will be renamed kexec_dbg_print. > > Other than the naming, the whole patch looks good to me. Thanks. > >> Next, we need to do four things: >> >> 1. rename kexec_file_dbg_print to kexec_core_dbg_print >> 2. Add KEXEC_DEBUG >> 3. Initialize kexec_core_dbg_print for kexec >> 4. Set the reset of kexec_file_dbg_print to kimage_free >> >> Signed-off-by: Qiang Ma >> --- >> include/linux/kexec.h | 9 +++++---- >> include/uapi/linux/kexec.h | 1 + >> kernel/kexec.c | 1 + >> kernel/kexec_core.c | 4 +++- >> kernel/kexec_file.c | 4 +--- >> 5 files changed, 11 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/kexec.h b/include/linux/kexec.h >> index ff7e231b0485..cad8b5c362af 100644 >> --- a/include/linux/kexec.h >> +++ b/include/linux/kexec.h >> @@ -455,10 +455,11 @@ bool kexec_load_permitted(int kexec_image_type); >> >> /* List of defined/legal kexec flags */ >> #ifndef CONFIG_KEXEC_JUMP >> -#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT) >> +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | KEXEC_CRASH_HOTPLUG_SUPPORT | \ >> + KEXEC_DEBUG) >> #else >> #define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR | \ >> - KEXEC_CRASH_HOTPLUG_SUPPORT) >> + KEXEC_CRASH_HOTPLUG_SUPPORT | KEXEC_DEBUG) >> #endif >> >> /* List of defined/legal kexec file flags */ >> @@ -525,10 +526,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g >> static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { } >> #endif >> >> -extern bool kexec_file_dbg_print; >> +extern bool kexec_core_dbg_print; >> >> #define kexec_dprintk(fmt, arg...) \ >> - do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) >> + do { if (kexec_core_dbg_print) pr_info(fmt, ##arg); } while (0) >> >> extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size); >> extern void kimage_unmap_segment(void *buffer); >> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h >> index 55749cb0b81d..819c600af125 100644 >> --- a/include/uapi/linux/kexec.h >> +++ b/include/uapi/linux/kexec.h >> @@ -14,6 +14,7 @@ >> #define KEXEC_PRESERVE_CONTEXT 0x00000002 >> #define KEXEC_UPDATE_ELFCOREHDR 0x00000004 >> #define KEXEC_CRASH_HOTPLUG_SUPPORT 0x00000008 >> +#define KEXEC_DEBUG 0x00000010 >> #define KEXEC_ARCH_MASK 0xffff0000 >> >> /* >> diff --git a/kernel/kexec.c b/kernel/kexec.c >> index 9bb1f2b6b268..c7a869d32f87 100644 >> --- a/kernel/kexec.c >> +++ b/kernel/kexec.c >> @@ -42,6 +42,7 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry, >> if (!image) >> return -ENOMEM; >> >> + kexec_core_dbg_print = !!(flags & KEXEC_DEBUG); >> image->start = entry; >> image->nr_segments = nr_segments; >> memcpy(image->segment, segments, nr_segments * sizeof(*segments)); >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index fa00b239c5d9..865f2b14f23b 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -53,7 +53,7 @@ atomic_t __kexec_lock = ATOMIC_INIT(0); >> /* Flag to indicate we are going to kexec a new kernel */ >> bool kexec_in_progress = false; >> >> -bool kexec_file_dbg_print; >> +bool kexec_core_dbg_print; >> >> /* >> * When kexec transitions to the new kernel there is a one-to-one >> @@ -576,6 +576,8 @@ void kimage_free(struct kimage *image) >> kimage_entry_t *ptr, entry; >> kimage_entry_t ind = 0; >> >> + kexec_core_dbg_print = false; >> + >> if (!image) >> return; >> >> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c >> index eb62a9794242..4a24aadbad02 100644 >> --- a/kernel/kexec_file.c >> +++ b/kernel/kexec_file.c >> @@ -138,8 +138,6 @@ void kimage_file_post_load_cleanup(struct kimage *image) >> */ >> kfree(image->image_loader_data); >> image->image_loader_data = NULL; >> - >> - kexec_file_dbg_print = false; >> } >> >> #ifdef CONFIG_KEXEC_SIG >> @@ -314,7 +312,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, >> if (!image) >> return -ENOMEM; >> >> - kexec_file_dbg_print = !!(flags & KEXEC_FILE_DEBUG); >> + kexec_core_dbg_print = !!(flags & KEXEC_FILE_DEBUG); >> image->file_mode = 1; >> >> #ifdef CONFIG_CRASH_DUMP >> -- >> 2.20.1 >> > From bhe at redhat.com Tue Nov 4 22:56:43 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 14:56:43 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: <0C92443D3E2100AF+c669d240-1ee8-4897-a30d-3efefe161085@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-5-maqianga@uniontech.com> <0C92443D3E2100AF+c669d240-1ee8-4897-a30d-3efefe161085@uniontech.com> Message-ID: On 11/05/25 at 11:47am, Qiang Ma wrote: > > ? 2025/11/5 11:05, Baoquan He ??: > > On 11/03/25 at 02:34pm, Qiang Ma wrote: > > > The type of the struct kimage member variable nr_segments is unsigned long. > > > Correct the loop variable i and the print format specifier type. > > I can't see what's meaningful with this change. nr_segments is unsigned > > long, but it's the range 'i' will loop. If so, we need change all for > > loop of the int iterator. > If image->nr_segments is large enough, 'i' overflow causes an infinite loop. Please check kexec_add_buffer(), there's checking for the value which upper limit is restricted to 16. if (kbuf->image->nr_segments >= KEXEC_SEGMENT_MAX) return -EINVAL; > > > Signed-off-by: Qiang Ma > > > --- > > > kernel/kexec_file.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > > > index 4a24aadbad02..7afdaa0efc50 100644 > > > --- a/kernel/kexec_file.c > > > +++ b/kernel/kexec_file.c > > > @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > > > int image_type = (flags & KEXEC_FILE_ON_CRASH) ? > > > KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; > > > struct kimage **dest_image, *image; > > > - int ret = 0, i; > > > + int ret = 0; > > > + unsigned long i; > > > /* We only trust the superuser with rebooting the system. */ > > > if (!kexec_load_permitted(image_type)) > > > @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, > > > struct kexec_segment *ksegment; > > > ksegment = &image->segment[i]; > > > - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > > > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > > > i, ksegment->buf, ksegment->bufsz, ksegment->mem, > > > ksegment->memsz); > > > -- > > > 2.20.1 > > > > > > From maqianga at uniontech.com Tue Nov 4 23:06:49 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 15:06:49 +0800 Subject: [PATCH v2 4/4] kexec_file: Fix the issue of mismatch between loop variable types In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-5-maqianga@uniontech.com> <0C92443D3E2100AF+c669d240-1ee8-4897-a30d-3efefe161085@uniontech.com> Message-ID: ? 2025/11/5 14:56, Baoquan He ??: > On 11/05/25 at 11:47am, Qiang Ma wrote: >> ? 2025/11/5 11:05, Baoquan He ??: >>> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>>> The type of the struct kimage member variable nr_segments is unsigned long. >>>> Correct the loop variable i and the print format specifier type. >>> I can't see what's meaningful with this change. nr_segments is unsigned >>> long, but it's the range 'i' will loop. If so, we need change all for >>> loop of the int iterator. >> If image->nr_segments is large enough, 'i' overflow causes an infinite loop. > Please check kexec_add_buffer(), there's checking for the value which > upper limit is restricted to 16. > > if (kbuf->image->nr_segments >= KEXEC_SEGMENT_MAX) > return -EINVAL; Oh, then this patch is really not necessary. >>>> Signed-off-by: Qiang Ma >>>> --- >>>> kernel/kexec_file.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c >>>> index 4a24aadbad02..7afdaa0efc50 100644 >>>> --- a/kernel/kexec_file.c >>>> +++ b/kernel/kexec_file.c >>>> @@ -366,7 +366,8 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, >>>> int image_type = (flags & KEXEC_FILE_ON_CRASH) ? >>>> KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; >>>> struct kimage **dest_image, *image; >>>> - int ret = 0, i; >>>> + int ret = 0; >>>> + unsigned long i; >>>> /* We only trust the superuser with rebooting the system. */ >>>> if (!kexec_load_permitted(image_type)) >>>> @@ -432,7 +433,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, >>>> struct kexec_segment *ksegment; >>>> ksegment = &image->segment[i]; >>>> - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>> i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>>> ksegment->memsz); >>>> -- >>>> 2.20.1 >>>> > From bhe at redhat.com Tue Nov 4 23:53:13 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 15:53:13 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> Message-ID: On 11/05/25 at 11:41am, Qiang Ma wrote: > > ? 2025/11/5 11:01, Baoquan He ??: > > On 11/03/25 at 02:34pm, Qiang Ma wrote: > > > The commit a85ee18c7900 ("kexec_file: print out debugging message > > > if required") has added general code printing in kexec_file_load(), > > > but not in kexec_load(). > > > > > > Especially in the RISC-V architecture, kexec_image_info() has been > > > removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging > > > message if required")). As a result, when using '-d' for the kexec_load > > > interface, print nothing in the kernel space. This might be helpful for > > > verifying the accuracy of the data passed to the kernel. Therefore, > > > refer to this commit a85ee18c7900 ("kexec_file: print out debugging > > > message if required"), debug print information has been added. > > > > > > Signed-off-by: Qiang Ma > > > Reported-by: kernel test robot > > > Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ > > > --- > > > kernel/kexec.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > > index c7a869d32f87..9b433b972cc1 100644 > > > --- a/kernel/kexec.c > > > +++ b/kernel/kexec.c > > > @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > if (ret) > > > goto out; > > > + kexec_dprintk("nr_segments = %lu\n", nr_segments); > > > for (i = 0; i < nr_segments; i++) { > > > + struct kexec_segment *ksegment; > > > + > > > + ksegment = &image->segment[i]; > > > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > > > + i, ksegment->buf, ksegment->bufsz, ksegment->mem, > > > + ksegment->memsz); > > There has already been a print_segments() in kexec-tools/kexec/kexec.c, > > you will get duplicated printing. That sounds not good. Have you tested > > this? > I have tested it, kexec-tools is the debug message printed > in user space, while kexec_dprintk is printed > in kernel space. > > This might be helpful for verifying the accuracy of > the data passed to the kernel. Hmm, that's not necessary with a debug printing to verify value passed in kernel. We should only add debug pringing when we need but lack it. I didn't check it carefully, if you add the debug printing only for verifying accuracy, that doesn't justify the code change. > > > + > > > ret = kimage_load_segment(image, i); > > > if (ret) > > > goto out; > > > @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > if (ret) > > > goto out; > > > + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", > > > + image->type, image->start, image->head, flags); > > > + > > > /* Install the new kernel and uninstall the old */ > > > image = xchg(dest_image, image); > > > -- > > > 2.20.1 > > > > > > From maqianga at uniontech.com Wed Nov 5 00:35:06 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 16:35:06 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> Message-ID: <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> ? 2025/11/5 15:53, Baoquan He ??: > On 11/05/25 at 11:41am, Qiang Ma wrote: >> ? 2025/11/5 11:01, Baoquan He ??: >>> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>>> The commit a85ee18c7900 ("kexec_file: print out debugging message >>>> if required") has added general code printing in kexec_file_load(), >>>> but not in kexec_load(). >>>> >>>> Especially in the RISC-V architecture, kexec_image_info() has been >>>> removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging >>>> message if required")). As a result, when using '-d' for the kexec_load >>>> interface, print nothing in the kernel space. This might be helpful for >>>> verifying the accuracy of the data passed to the kernel. Therefore, >>>> refer to this commit a85ee18c7900 ("kexec_file: print out debugging >>>> message if required"), debug print information has been added. >>>> >>>> Signed-off-by: Qiang Ma >>>> Reported-by: kernel test robot >>>> Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ >>>> --- >>>> kernel/kexec.c | 11 +++++++++++ >>>> 1 file changed, 11 insertions(+) >>>> >>>> diff --git a/kernel/kexec.c b/kernel/kexec.c >>>> index c7a869d32f87..9b433b972cc1 100644 >>>> --- a/kernel/kexec.c >>>> +++ b/kernel/kexec.c >>>> @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>> if (ret) >>>> goto out; >>>> + kexec_dprintk("nr_segments = %lu\n", nr_segments); >>>> for (i = 0; i < nr_segments; i++) { >>>> + struct kexec_segment *ksegment; >>>> + >>>> + ksegment = &image->segment[i]; >>>> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>> + i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>>> + ksegment->memsz); >>> There has already been a print_segments() in kexec-tools/kexec/kexec.c, >>> you will get duplicated printing. That sounds not good. Have you tested >>> this? >> I have tested it, kexec-tools is the debug message printed >> in user space, while kexec_dprintk is printed >> in kernel space. >> >> This might be helpful for verifying the accuracy of >> the data passed to the kernel. > Hmm, that's not necessary with a debug printing to verify value passed > in kernel. We should only add debug pringing when we need but lack it. > I didn't check it carefully, if you add the debug printing only for > verifying accuracy, that doesn't justify the code change. It's not entirely because of it. Another reason is that for RISC-V, for kexec_file_load interface, kexec_image_info() was deleted at that time because the content has been printed out in generic code. However, these contents were not printed in kexec_load because kexec_image_info was deleted. So now it has been added. >>>> + >>>> ret = kimage_load_segment(image, i); >>>> if (ret) >>>> goto out; >>>> @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>> if (ret) >>>> goto out; >>>> + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", >>>> + image->type, image->start, image->head, flags); >>>> + >>>> /* Install the new kernel and uninstall the old */ >>>> image = xchg(dest_image, image); >>>> -- >>>> 2.20.1 >>>> > From maqianga at uniontech.com Wed Nov 5 00:48:59 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 16:48:59 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> Message-ID: <02A386F1B9701FED+a0b3ab16-3f23-4d69-9fb8-ab4d9f918bad@uniontech.com> ? 2025/11/5 15:53, Baoquan He ??: > On 11/05/25 at 11:41am, Qiang Ma wrote: >> ? 2025/11/5 11:01, Baoquan He ??: >>> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>>> The commit a85ee18c7900 ("kexec_file: print out debugging message >>>> if required") has added general code printing in kexec_file_load(), >>>> but not in kexec_load(). >>>> >>>> Especially in the RISC-V architecture, kexec_image_info() has been >>>> removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging >>>> message if required")). As a result, when using '-d' for the kexec_load >>>> interface, print nothing in the kernel space. This might be helpful for >>>> verifying the accuracy of the data passed to the kernel. Therefore, >>>> refer to this commit a85ee18c7900 ("kexec_file: print out debugging >>>> message if required"), debug print information has been added. >>>> >>>> Signed-off-by: Qiang Ma >>>> Reported-by: kernel test robot >>>> Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ >>>> --- >>>> kernel/kexec.c | 11 +++++++++++ >>>> 1 file changed, 11 insertions(+) >>>> >>>> diff --git a/kernel/kexec.c b/kernel/kexec.c >>>> index c7a869d32f87..9b433b972cc1 100644 >>>> --- a/kernel/kexec.c >>>> +++ b/kernel/kexec.c >>>> @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>> if (ret) >>>> goto out; >>>> + kexec_dprintk("nr_segments = %lu\n", nr_segments); >>>> for (i = 0; i < nr_segments; i++) { >>>> + struct kexec_segment *ksegment; >>>> + >>>> + ksegment = &image->segment[i]; >>>> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>> + i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>>> + ksegment->memsz); >>> There has already been a print_segments() in kexec-tools/kexec/kexec.c, >>> you will get duplicated printing. That sounds not good. Have you tested >>> this? >> I have tested it, kexec-tools is the debug message printed >> in user space, while kexec_dprintk is printed >> in kernel space. >> >> This might be helpful for verifying the accuracy of >> the data passed to the kernel. > Hmm, that's not necessary with a debug printing to verify value passed > in kernel. We should only add debug pringing when we need but lack it. > I didn't check it carefully, if you add the debug printing only for > verifying accuracy, that doesn't justify the code change. > Also, adding these prints here is helpful for debugging the kimage_load_segment(). >>>> + >>>> ret = kimage_load_segment(image, i); >>>> if (ret) >>>> goto out; >>>> @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>> if (ret) >>>> goto out; >>>> + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", >>>> + image->type, image->start, image->head, flags); >>>> + >>>> /* Install the new kernel and uninstall the old */ >>>> image = xchg(dest_image, image); >>>> -- >>>> 2.20.1 >>>> > From bhe at redhat.com Wed Nov 5 00:55:28 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 16:55:28 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> Message-ID: On 11/05/25 at 04:35pm, Qiang Ma wrote: > > ? 2025/11/5 15:53, Baoquan He ??: > > On 11/05/25 at 11:41am, Qiang Ma wrote: > > > ? 2025/11/5 11:01, Baoquan He ??: > > > > On 11/03/25 at 02:34pm, Qiang Ma wrote: > > > > > The commit a85ee18c7900 ("kexec_file: print out debugging message > > > > > if required") has added general code printing in kexec_file_load(), > > > > > but not in kexec_load(). > > > > > > > > > > Especially in the RISC-V architecture, kexec_image_info() has been > > > > > removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging > > > > > message if required")). As a result, when using '-d' for the kexec_load > > > > > interface, print nothing in the kernel space. This might be helpful for > > > > > verifying the accuracy of the data passed to the kernel. Therefore, > > > > > refer to this commit a85ee18c7900 ("kexec_file: print out debugging > > > > > message if required"), debug print information has been added. > > > > > > > > > > Signed-off-by: Qiang Ma > > > > > Reported-by: kernel test robot > > > > > Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ > > > > > --- > > > > > kernel/kexec.c | 11 +++++++++++ > > > > > 1 file changed, 11 insertions(+) > > > > > > > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > > > > index c7a869d32f87..9b433b972cc1 100644 > > > > > --- a/kernel/kexec.c > > > > > +++ b/kernel/kexec.c > > > > > @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > > > if (ret) > > > > > goto out; > > > > > + kexec_dprintk("nr_segments = %lu\n", nr_segments); > > > > > for (i = 0; i < nr_segments; i++) { > > > > > + struct kexec_segment *ksegment; > > > > > + > > > > > + ksegment = &image->segment[i]; > > > > > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > > > > > + i, ksegment->buf, ksegment->bufsz, ksegment->mem, > > > > > + ksegment->memsz); > > > > There has already been a print_segments() in kexec-tools/kexec/kexec.c, > > > > you will get duplicated printing. That sounds not good. Have you tested > > > > this? > > > I have tested it, kexec-tools is the debug message printed > > > in user space, while kexec_dprintk is printed > > > in kernel space. > > > > > > This might be helpful for verifying the accuracy of > > > the data passed to the kernel. > > Hmm, that's not necessary with a debug printing to verify value passed > > in kernel. We should only add debug pringing when we need but lack it. > > I didn't check it carefully, if you add the debug printing only for > > verifying accuracy, that doesn't justify the code change. > It's not entirely because of it. > > Another reason is that for RISC-V, for kexec_file_load interface, > kexec_image_info() was deleted at that time because the content > has been printed out in generic code. > > However, these contents were not printed in kexec_load because > kexec_image_info was deleted. So now it has been added. print_segments() in kexec-tools/kexec/kexec.c is a generic function, shouldn't you make it called in kexec-tools for risc-v? I am confused by the purpose of this patchset. > > > > > + > > > > > ret = kimage_load_segment(image, i); > > > > > if (ret) > > > > > goto out; > > > > > @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > > > if (ret) > > > > > goto out; > > > > > + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", > > > > > + image->type, image->start, image->head, flags); > > > > > + > > > > > /* Install the new kernel and uninstall the old */ > > > > > image = xchg(dest_image, image); > > > > > -- > > > > > 2.20.1 > > > > > > > > From pratyush at kernel.org Wed Nov 5 01:44:24 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Wed, 05 Nov 2025 10:44:24 +0100 Subject: [PATCH] MAINTAINERS: extend file entry in KHO to include subdirectories In-Reply-To: <20251104143238.119803-1-lukas.bulwahn@redhat.com> (Lukas Bulwahn's message of "Tue, 4 Nov 2025 15:32:38 +0100") References: <20251104143238.119803-1-lukas.bulwahn@redhat.com> Message-ID: On Tue, Nov 04 2025, Lukas Bulwahn wrote: > From: Lukas Bulwahn > > Commit 3498209ff64e ("Documentation: add documentation for KHO") adds the > file entry for 'Documentation/core-api/kho/*'. The asterisk in the end > means that all files in kho are included, but not files in its > subdirectories below. > Hence, the files under Documentation/core-api/kho/bindings/ are not > considered part of KHO, and get_maintainers.pl does not necessarily add the > KHO maintainers to the recipients of patches to those files. Probably, this > is not intended, though, and it was simply an oversight of the detailed > semantics of such file entries. > > Make the file entry to include the subdirectories of > Documentation/core-api/kho/. > > Signed-off-by: Lukas Bulwahn Reviewed-by: Pratyush Yadav [...] -- Regards, Pratyush Yadav From pratyush at kernel.org Wed Nov 5 02:06:13 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Wed, 05 Nov 2025 11:06:13 +0100 Subject: [PATCH 0/2] kho: misc fixes In-Reply-To: <20251103172321.689294e48c2fae795e114ce6@linux-foundation.org> (Andrew Morton's message of "Mon, 3 Nov 2025 17:23:21 -0800") References: <20251103180235.71409-1-pratyush@kernel.org> <20251103162020.ac696dbc695f9341e7a267f7@linux-foundation.org> <20251103172321.689294e48c2fae795e114ce6@linux-foundation.org> Message-ID: On Mon, Nov 03 2025, Andrew Morton wrote: > On Mon, 3 Nov 2025 16:20:20 -0800 Andrew Morton wrote: > >> On Mon, 3 Nov 2025 19:02:30 +0100 Pratyush Yadav wrote: >> >> > This series has a couple of misc fixes for KHO I discovered during code >> > review and testing. >> > >> > The series is based on top of [0] which has another fix for the function >> > touched by patch 1. I spotted these two after sending the patch. If that >> > one needs a reroll, I can combine the three into a series. >> > >> >> Things appear to be misordered here. >> >> [1/2] "kho: fix unpreservation of higher-order vmalloc preservations" >> fixes a667300bd53f2, so it's wanted in 6.18-rcX >> >> [2/2] "kho: warn and exit when unpreserved page wasn't preserved" >> fixes fc33e4b44b271, so it's wanted in 6.16+ >> >> So can we please have [2/2] as a standalone fix against latest -linus, >> with a cc:stable? >> >> And then [1/2] as a standalone fix against latest -linus without a >> cc:stable. >> > > OK, I think I figured it out. > > In mm-hotfixes-unstable I have > > kho-fix-out-of-bounds-access-of-vmalloc-chunk.patch > kho-fix-unpreservation-of-higher-order-vmalloc-preservations.patch > kho-warn-and-exit-when-unpreserved-page-wasnt-preserved.patch > > The first two are applicable to 6.18-rcX and the third is applicable to > 6.18-rcX, with a cc:stable for backporting. Right. Sorry for the confusion. I see that on mm-hotfixes-unstable you already updated the third patch with Cc: stable. Thanks. -- Regards, Pratyush Yadav From leitao at debian.org Wed Nov 5 02:18:11 2025 From: leitao at debian.org (Breno Leitao) Date: Wed, 5 Nov 2025 02:18:11 -0800 Subject: [PATCH v8 01/17] memblock: add MEMBLOCK_RSRV_KERN flag In-Reply-To: References: <20250509074635.3187114-1-changyuanl@google.com> <20250509074635.3187114-2-changyuanl@google.com> <2ege2jfbevtunhxsnutbzde7cqwgu5qbj4bbuw2umw7ke7ogcn@5wtskk4exzsi> Message-ID: Hello Pratyush, On Tue, Oct 14, 2025 at 03:10:37PM +0200, Pratyush Yadav wrote: > On Tue, Oct 14 2025, Breno Leitao wrote: > > On Mon, Oct 13, 2025 at 06:40:09PM +0200, Pratyush Yadav wrote: > >> On Mon, Oct 13 2025, Pratyush Yadav wrote: > >> > > >> > I suppose this would be useful. I think enabling memblock debug prints > >> > would also be helpful (using the "memblock=debug" commandline parameter) > >> > if it doesn't impact your production environment too much. > >> > >> Actually, I think "memblock=debug" is going to be the more useful thing > >> since it would also show what function allocated the overlapping range > >> and the flags it was allocated with. > >> > >> On my qemu VM with KVM, this results in around 70 prints from memblock. > >> So it adds a bit of extra prints but nothing that should be too > >> disrupting I think. Plus, only at boot so the worst thing you get is > >> slightly slower boot times. > > > > Unfortunately this issue is happening on production systems, and I don't > > have an easy way to reproduce it _yet_. > > > > At the same time, "memblock=debug" has two problems: > > > > 1) It slows the boot time as you suggested. Boot time at large > > environments is SUPER critical and time sensitive. It is a bit > > weird, but it is common for machines in production to kexec > > _thousands_ of times, and kexecing is considered downtime. > > I don't know if it would make a real enough difference on boot times, > only that it should theoretically affect it, mainly if you are using > serial for dmesg logs. Anyway, that's your production environment so you > know best. > > > > > This would be useful if I find some hosts getting this issue, and > > then I can easily enable the extra information to collect what > > I need, but, this didn't pan out because the hosts I got > > `memblock=debug` didn't collaborate. > > > > 2) "memblock=debug" is verbose for all cases, which also not necessary > > the desired behaviour. I am more interested in only being verbose > > when there is a known problem. I am still interested in this problem, and I finally found a host that constantly reproduce the issue and I was able to get `memblock=debug` cmdline. I am running 6.18-rc4 with some debug options enabled. DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000006d6e400 WARNING: CPU: 58 PID: 828 at kernel/dma/debug.c:463 add_dma_entry+0x2e4/0x330 pc : add_dma_entry+0x2e4/0x330 lr : add_dma_entry+0x2e4/0x330 sp : ffff8000b036f7f0 x29: ffff8000b036f800 x28: 0000000000000001 x27: 0000000000000008 x26: ffff8000835f7fb8 x25: ffff8000835f7000 x24: ffff8000835f7ee0 x23: 0000000000000000 x22: 0000000006d6e400 x21: 0000000000000000 x20: 0000000006d6e400 x19: ffff0003f70c1100 x18: 00000000ffffffff x17: ffff80008019a2d8 x16: ffff80008019a08c x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000820 x12: ffff00011faeaf00 x11: 0000000000000000 x10: ffff8000834633d8 x9 : ffff8000801979d4 x8 : 00000000fffeffff x7 : ffff8000834633d8 x6 : 0000000000000000 x5 : 00000000000bfff4 x4 : 0000000000000000 x3 : ffff0001075eb7c0 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001075eb7c0 Call trace: add_dma_entry+0x2e4/0x330 (P) debug_dma_map_phys+0xc4/0xf0 dma_map_phys (/home/leit/Devel/upstream/./include/linux/dma-direct.h:138 /home/leit/Devel/upstream/kernel/dma/direct.h:102 /home/leit/Devel/upstream/kernel/dma/mapping.c:169) dma_map_page_attrs (/home/leit/Devel/upstream/kernel/dma/mapping.c:387) blk_dma_map_direct.isra.0 (/home/leit/Devel/upstream/block/blk-mq-dma.c:102) blk_dma_map_iter_start (/home/leit/Devel/upstream/block/blk-mq-dma.c:123 /home/leit/Devel/upstream/block/blk-mq-dma.c:196) blk_rq_dma_map_iter_start (/home/leit/Devel/upstream/block/blk-mq-dma.c:228) nvme_prep_rq+0xb8/0x9b8 nvme_queue_rq+0x44/0x1b0 blk_mq_dispatch_rq_list (/home/leit/Devel/upstream/block/blk-mq.c:2129) __blk_mq_sched_dispatch_requests (/home/leit/Devel/upstream/block/blk-mq-sched.c:314) blk_mq_sched_dispatch_requests (/home/leit/Devel/upstream/block/blk-mq-sched.c:329) blk_mq_run_work_fn (/home/leit/Devel/upstream/block/blk-mq.c:219 /home/leit/Devel/upstream/block/blk-mq.c:231) process_one_work (/home/leit/Devel/upstream/kernel/workqueue.c:991 /home/leit/Devel/upstream/kernel/workqueue.c:3213) worker_thread (/home/leit/Devel/upstream/./include/linux/list.h:163 /home/leit/Devel/upstream/./include/linux/list.h:191 /home/leit/Devel/upstream/./include/linux/list.h:319 /home/leit/Devel/upstream/kernel/workqueue.c:1153 /home/leit/Devel/upstream/kernel/workqueue.c:1205 /home/leit/Devel/upstream/kernel/workqueue.c:3426) kthread (/home/leit/Devel/upstream/kernel/kthread.c:386 /home/leit/Devel/upstream/kernel/kthread.c:457) ret_from_fork (/home/leit/Devel/upstream/entry.S:861) Looking at memblock debug logs, I haven't seen anything related to 0x0000000006d6e400. I got the output of `dmesg | grep memblock` in, in case you are curious: https://github.com/leitao/debug/blob/main/pastebin/memblock/dmesg_grep_memblock.txt Thanks --breno From pratyush at kernel.org Wed Nov 5 02:20:19 2025 From: pratyush at kernel.org (Pratyush Yadav) Date: Wed, 5 Nov 2025 11:20:19 +0100 Subject: [PATCH] MAINTAINERS: add myself as a reviewer for KHO Message-ID: <20251105102022.18798-1-pratyush@kernel.org> I have been reviewing most patches for KHO already, and it is easier to spot them if I am directly in Cc. Signed-off-by: Pratyush Yadav --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 8ee7cb5fe838f..3c85bb0e381fc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13789,6 +13789,7 @@ KEXEC HANDOVER (KHO) M: Alexander Graf M: Mike Rapoport M: Pasha Tatashin +R: Pratyush Yadav L: kexec at lists.infradead.org L: linux-mm at kvack.org S: Maintained base-commit: d25eefc46daf21bd1ebbc699f0ffd7fe11d92296 -- 2.47.3 From maqianga at uniontech.com Wed Nov 5 03:28:10 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 19:28:10 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> Message-ID: <44308A6B6D8BEB61+c143d52e-03dd-48bf-aadd-8a0d9196b280@uniontech.com> ? 2025/11/5 16:55, Baoquan He ??: > On 11/05/25 at 04:35pm, Qiang Ma wrote: >> ? 2025/11/5 15:53, Baoquan He ??: >>> On 11/05/25 at 11:41am, Qiang Ma wrote: >>>> ? 2025/11/5 11:01, Baoquan He ??: >>>>> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>>>>> The commit a85ee18c7900 ("kexec_file: print out debugging message >>>>>> if required") has added general code printing in kexec_file_load(), >>>>>> but not in kexec_load(). >>>>>> >>>>>> Especially in the RISC-V architecture, kexec_image_info() has been >>>>>> removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging >>>>>> message if required")). As a result, when using '-d' for the kexec_load >>>>>> interface, print nothing in the kernel space. This might be helpful for >>>>>> verifying the accuracy of the data passed to the kernel. Therefore, >>>>>> refer to this commit a85ee18c7900 ("kexec_file: print out debugging >>>>>> message if required"), debug print information has been added. >>>>>> >>>>>> Signed-off-by: Qiang Ma >>>>>> Reported-by: kernel test robot >>>>>> Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ >>>>>> --- >>>>>> kernel/kexec.c | 11 +++++++++++ >>>>>> 1 file changed, 11 insertions(+) >>>>>> >>>>>> diff --git a/kernel/kexec.c b/kernel/kexec.c >>>>>> index c7a869d32f87..9b433b972cc1 100644 >>>>>> --- a/kernel/kexec.c >>>>>> +++ b/kernel/kexec.c >>>>>> @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>>>> if (ret) >>>>>> goto out; >>>>>> + kexec_dprintk("nr_segments = %lu\n", nr_segments); >>>>>> for (i = 0; i < nr_segments; i++) { >>>>>> + struct kexec_segment *ksegment; >>>>>> + >>>>>> + ksegment = &image->segment[i]; >>>>>> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>>>> + i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>>>>> + ksegment->memsz); >>>>> There has already been a print_segments() in kexec-tools/kexec/kexec.c, >>>>> you will get duplicated printing. That sounds not good. Have you tested >>>>> this? >>>> I have tested it, kexec-tools is the debug message printed >>>> in user space, while kexec_dprintk is printed >>>> in kernel space. >>>> >>>> This might be helpful for verifying the accuracy of >>>> the data passed to the kernel. >>> Hmm, that's not necessary with a debug printing to verify value passed >>> in kernel. We should only add debug pringing when we need but lack it. >>> I didn't check it carefully, if you add the debug printing only for >>> verifying accuracy, that doesn't justify the code change. >> It's not entirely because of it. >> >> Another reason is that for RISC-V, for kexec_file_load interface, >> kexec_image_info() was deleted at that time because the content >> has been printed out in generic code. >> >> However, these contents were not printed in kexec_load because >> kexec_image_info was deleted. So now it has been added. > print_segments() in kexec-tools/kexec/kexec.c is a generic function, > shouldn't you make it called in kexec-tools for risc-v? I am confused by > the purpose of this patchset. There is a problem with what I expressed. I don't want to add print_segments to riscv. I want to add some debugging message(ksegment,kimage,flag) for kexec_load. Although ksegment debugging message has been printed in kexec-tools, it is still helpful for debugging the kernel space function. > >>>>>> + >>>>>> ret = kimage_load_segment(image, i); >>>>>> if (ret) >>>>>> goto out; >>>>>> @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>>>> if (ret) >>>>>> goto out; >>>>>> + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", >>>>>> + image->type, image->start, image->head, flags); >>>>>> + >>>>>> /* Install the new kernel and uninstall the old */ >>>>>> image = xchg(dest_image, image); >>>>>> -- >>>>>> 2.20.1 >>>>>> > From bhe at redhat.com Wed Nov 5 05:01:12 2025 From: bhe at redhat.com (Baoquan He) Date: Wed, 5 Nov 2025 21:01:12 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: <44308A6B6D8BEB61+c143d52e-03dd-48bf-aadd-8a0d9196b280@uniontech.com> References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> <44308A6B6D8BEB61+c143d52e-03dd-48bf-aadd-8a0d9196b280@uniontech.com> Message-ID: On 11/05/25 at 07:28pm, Qiang Ma wrote: > > ? 2025/11/5 16:55, Baoquan He ??: > > On 11/05/25 at 04:35pm, Qiang Ma wrote: > > > ? 2025/11/5 15:53, Baoquan He ??: > > > > On 11/05/25 at 11:41am, Qiang Ma wrote: > > > > > ? 2025/11/5 11:01, Baoquan He ??: > > > > > > On 11/03/25 at 02:34pm, Qiang Ma wrote: > > > > > > > The commit a85ee18c7900 ("kexec_file: print out debugging message > > > > > > > if required") has added general code printing in kexec_file_load(), > > > > > > > but not in kexec_load(). > > > > > > > > > > > > > > Especially in the RISC-V architecture, kexec_image_info() has been > > > > > > > removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging > > > > > > > message if required")). As a result, when using '-d' for the kexec_load > > > > > > > interface, print nothing in the kernel space. This might be helpful for > > > > > > > verifying the accuracy of the data passed to the kernel. Therefore, > > > > > > > refer to this commit a85ee18c7900 ("kexec_file: print out debugging > > > > > > > message if required"), debug print information has been added. > > > > > > > > > > > > > > Signed-off-by: Qiang Ma > > > > > > > Reported-by: kernel test robot > > > > > > > Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ > > > > > > > --- > > > > > > > kernel/kexec.c | 11 +++++++++++ > > > > > > > 1 file changed, 11 insertions(+) > > > > > > > > > > > > > > diff --git a/kernel/kexec.c b/kernel/kexec.c > > > > > > > index c7a869d32f87..9b433b972cc1 100644 > > > > > > > --- a/kernel/kexec.c > > > > > > > +++ b/kernel/kexec.c > > > > > > > @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > > > > > if (ret) > > > > > > > goto out; > > > > > > > + kexec_dprintk("nr_segments = %lu\n", nr_segments); > > > > > > > for (i = 0; i < nr_segments; i++) { > > > > > > > + struct kexec_segment *ksegment; > > > > > > > + > > > > > > > + ksegment = &image->segment[i]; > > > > > > > + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", > > > > > > > + i, ksegment->buf, ksegment->bufsz, ksegment->mem, > > > > > > > + ksegment->memsz); > > > > > > There has already been a print_segments() in kexec-tools/kexec/kexec.c, > > > > > > you will get duplicated printing. That sounds not good. Have you tested > > > > > > this? > > > > > I have tested it, kexec-tools is the debug message printed > > > > > in user space, while kexec_dprintk is printed > > > > > in kernel space. > > > > > > > > > > This might be helpful for verifying the accuracy of > > > > > the data passed to the kernel. > > > > Hmm, that's not necessary with a debug printing to verify value passed > > > > in kernel. We should only add debug pringing when we need but lack it. > > > > I didn't check it carefully, if you add the debug printing only for > > > > verifying accuracy, that doesn't justify the code change. > > > It's not entirely because of it. > > > > > > Another reason is that for RISC-V, for kexec_file_load interface, > > > kexec_image_info() was deleted at that time because the content > > > has been printed out in generic code. > > > > > > However, these contents were not printed in kexec_load because > > > kexec_image_info was deleted. So now it has been added. > > print_segments() in kexec-tools/kexec/kexec.c is a generic function, > > shouldn't you make it called in kexec-tools for risc-v? I am confused by > > the purpose of this patchset. > There is a problem with what I expressed. > I don't want to add print_segments to riscv. > I want to add some debugging message(ksegment,kimage,flag) for kexec_load. > > Although ksegment debugging message has been printed in kexec-tools, > it is still helpful for debugging the kernel space function. Sorry, I can't support that. We all prepare the loading segments for the future jumping in kexec_tools if it's kexec_load interface. And calling print_segments() to print those loading information is natural. Why do we need print them two times for verifying if the printing is accuracy? Could you explain why risc-v is special? > > > > > > > > > + > > > > > > > ret = kimage_load_segment(image, i); > > > > > > > if (ret) > > > > > > > goto out; > > > > > > > @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, > > > > > > > if (ret) > > > > > > > goto out; > > > > > > > + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", > > > > > > > + image->type, image->start, image->head, flags); > > > > > > > + > > > > > > > /* Install the new kernel and uninstall the old */ > > > > > > > image = xchg(dest_image, image); > > > > > > > -- > > > > > > > 2.20.1 > > > > > > > > > > From piliu at redhat.com Wed Nov 5 05:09:21 2025 From: piliu at redhat.com (Pingfan Liu) Date: Wed, 5 Nov 2025 21:09:21 +0800 Subject: [PATCH 1/2] kernel/kexec: Change the prototype of kimage_map_segment() Message-ID: <20251105130922.13321-1-piliu@redhat.com> The kexec segment index will be required to extract the corresponding information for that segment in kimage_map_segment(). Additionally, kexec_segment already holds the kexec relocation destination address and size. Therefore, the prototype of kimage_map_segment() can be changed. Signed-off-by: Pingfan Liu Cc: Andrew Morton Cc: Baoquan He Cc: Mimi Zohar Cc: Roberto Sassu Cc: Alexander Graf Cc: Steven Chen To: kexec at lists.infradead.org To: linux-integrity at vger.kernel.org --- include/linux/kexec.h | 4 ++-- kernel/kexec_core.c | 9 ++++++--- security/integrity/ima/ima_kexec.c | 4 +--- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ff7e231b0485..8a22bc9b8c6c 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -530,7 +530,7 @@ extern bool kexec_file_dbg_print; #define kexec_dprintk(fmt, arg...) \ do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) -extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size); +extern void *kimage_map_segment(struct kimage *image, int idx); extern void kimage_unmap_segment(void *buffer); #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; @@ -540,7 +540,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { } static inline void crash_kexec(struct pt_regs *regs) { } static inline int kexec_should_crash(struct task_struct *p) { return 0; } static inline int kexec_crash_loaded(void) { return 0; } -static inline void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size) +static inline void *kimage_map_segment(struct kimage *image, int idx) { return NULL; } static inline void kimage_unmap_segment(void *buffer) { } #define kexec_in_progress false diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index fa00b239c5d9..9a1966207041 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -960,17 +960,20 @@ int kimage_load_segment(struct kimage *image, int idx) return result; } -void *kimage_map_segment(struct kimage *image, - unsigned long addr, unsigned long size) +void *kimage_map_segment(struct kimage *image, int idx) { + unsigned long addr, size, eaddr; unsigned long src_page_addr, dest_page_addr = 0; - unsigned long eaddr = addr + size; kimage_entry_t *ptr, entry; struct page **src_pages; unsigned int npages; void *vaddr = NULL; int i; + addr = image->segment[idx].mem; + size = image->segment[idx].memsz; + eaddr = addr + size; + /* * Collect the source pages and map them in a contiguous VA range. */ diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 7362f68f2d8b..5beb69edd12f 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -250,9 +250,7 @@ void ima_kexec_post_load(struct kimage *image) if (!image->ima_buffer_addr) return; - ima_kexec_buffer = kimage_map_segment(image, - image->ima_buffer_addr, - image->ima_buffer_size); + ima_kexec_buffer = kimage_map_segment(image, image->ima_segment_index); if (!ima_kexec_buffer) { pr_err("Could not map measurements buffer.\n"); return; -- 2.49.0 From piliu at redhat.com Wed Nov 5 05:09:22 2025 From: piliu at redhat.com (Pingfan Liu) Date: Wed, 5 Nov 2025 21:09:22 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: <20251105130922.13321-1-piliu@redhat.com> References: <20251105130922.13321-1-piliu@redhat.com> Message-ID: <20251105130922.13321-2-piliu@redhat.com> When I tested kexec with the latest kernel, I ran into the following warning: [ 40.712410] ------------[ cut here ]------------ [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 [...] [ 40.816047] Call trace: [ 40.818498] kimage_map_segment+0x144/0x198 (P) [ 40.823221] ima_kexec_post_load+0x58/0xc0 [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 [...] [ 40.855423] ---[ end trace 0000000000000000 ]--- This is caused by the fact that kexec allocates the destination directly in the CMA area. In that case, the CMA kernel address should be exported directly to the IMA component, instead of using the vmalloc'd address. Signed-off-by: Pingfan Liu Cc: Andrew Morton Cc: Baoquan He Cc: Alexander Graf Cc: Steven Chen Cc: linux-integrity at vger.kernel.org To: kexec at lists.infradead.org --- kernel/kexec_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 9a1966207041..abe40286a02c 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -967,6 +967,7 @@ void *kimage_map_segment(struct kimage *image, int idx) kimage_entry_t *ptr, entry; struct page **src_pages; unsigned int npages; + struct page *cma; void *vaddr = NULL; int i; @@ -974,6 +975,9 @@ void *kimage_map_segment(struct kimage *image, int idx) size = image->segment[idx].memsz; eaddr = addr + size; + cma = image->segment_cma[idx]; + if (cma) + return cma; /* * Collect the source pages and map them in a contiguous VA range. */ @@ -1014,7 +1018,8 @@ void *kimage_map_segment(struct kimage *image, int idx) void kimage_unmap_segment(void *segment_buffer) { - vunmap(segment_buffer); + if (is_vmalloc_addr(segment_buffer)) + vunmap(segment_buffer); } struct kexec_load_limit { -- 2.49.0 From maqianga at uniontech.com Wed Nov 5 07:05:01 2025 From: maqianga at uniontech.com (Qiang Ma) Date: Wed, 5 Nov 2025 23:05:01 +0800 Subject: [PATCH v2 3/4] kexec: print out debugging message if required for kexec_load In-Reply-To: References: <20251103063440.1681657-1-maqianga@uniontech.com> <20251103063440.1681657-4-maqianga@uniontech.com> <5FC4A8D79744B238+97288be4-6c1a-4c0d-ae7d-be2029ec87f3@uniontech.com> <2331A9F3E09581FC+4ab7e9ba-8776-47d2-868f-cb01ca9cd909@uniontech.com> <44308A6B6D8BEB61+c143d52e-03dd-48bf-aadd-8a0d9196b280@uniontech.com> Message-ID: <1E51DC0D8C72320F+8ad85e3e-1f03-4ca9-ba29-f2ff8a4cb831@uniontech.com> On 2025/11/5 ??9:01, Baoquan He wrote: > On 11/05/25 at 07:28pm, Qiang Ma wrote: >> ? 2025/11/5 16:55, Baoquan He ??: >>> On 11/05/25 at 04:35pm, Qiang Ma wrote: >>>> ? 2025/11/5 15:53, Baoquan He ??: >>>>> On 11/05/25 at 11:41am, Qiang Ma wrote: >>>>>> ? 2025/11/5 11:01, Baoquan He ??: >>>>>>> On 11/03/25 at 02:34pm, Qiang Ma wrote: >>>>>>>> The commit a85ee18c7900 ("kexec_file: print out debugging message >>>>>>>> if required") has added general code printing in kexec_file_load(), >>>>>>>> but not in kexec_load(). >>>>>>>> >>>>>>>> Especially in the RISC-V architecture, kexec_image_info() has been >>>>>>>> removed(commit eb7622d908a0 ("kexec_file, riscv: print out debugging >>>>>>>> message if required")). As a result, when using '-d' for the kexec_load >>>>>>>> interface, print nothing in the kernel space. This might be helpful for >>>>>>>> verifying the accuracy of the data passed to the kernel. Therefore, >>>>>>>> refer to this commit a85ee18c7900 ("kexec_file: print out debugging >>>>>>>> message if required"), debug print information has been added. >>>>>>>> >>>>>>>> Signed-off-by: Qiang Ma >>>>>>>> Reported-by: kernel test robot >>>>>>>> Closes: https://lore.kernel.org/oe-kbuild-all/202510310332.6XrLe70K-lkp at intel.com/ >>>>>>>> --- >>>>>>>> kernel/kexec.c | 11 +++++++++++ >>>>>>>> 1 file changed, 11 insertions(+) >>>>>>>> >>>>>>>> diff --git a/kernel/kexec.c b/kernel/kexec.c >>>>>>>> index c7a869d32f87..9b433b972cc1 100644 >>>>>>>> --- a/kernel/kexec.c >>>>>>>> +++ b/kernel/kexec.c >>>>>>>> @@ -154,7 +154,15 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>>>>>> if (ret) >>>>>>>> goto out; >>>>>>>> + kexec_dprintk("nr_segments = %lu\n", nr_segments); >>>>>>>> for (i = 0; i < nr_segments; i++) { >>>>>>>> + struct kexec_segment *ksegment; >>>>>>>> + >>>>>>>> + ksegment = &image->segment[i]; >>>>>>>> + kexec_dprintk("segment[%lu]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", >>>>>>>> + i, ksegment->buf, ksegment->bufsz, ksegment->mem, >>>>>>>> + ksegment->memsz); >>>>>>> There has already been a print_segments() in kexec-tools/kexec/kexec.c, >>>>>>> you will get duplicated printing. That sounds not good. Have you tested >>>>>>> this? >>>>>> I have tested it, kexec-tools is the debug message printed >>>>>> in user space, while kexec_dprintk is printed >>>>>> in kernel space. >>>>>> >>>>>> This might be helpful for verifying the accuracy of >>>>>> the data passed to the kernel. >>>>> Hmm, that's not necessary with a debug printing to verify value passed >>>>> in kernel. We should only add debug pringing when we need but lack it. >>>>> I didn't check it carefully, if you add the debug printing only for >>>>> verifying accuracy, that doesn't justify the code change. >>>> It's not entirely because of it. >>>> >>>> Another reason is that for RISC-V, for kexec_file_load interface, >>>> kexec_image_info() was deleted at that time because the content >>>> has been printed out in generic code. >>>> >>>> However, these contents were not printed in kexec_load because >>>> kexec_image_info was deleted. So now it has been added. >>> print_segments() in kexec-tools/kexec/kexec.c is a generic function, >>> shouldn't you make it called in kexec-tools for risc-v? I am confused by >>> the purpose of this patchset. >> There is a problem with what I expressed. >> I don't want to add print_segments to riscv. >> I want to add some debugging message(ksegment,kimage,flag) for kexec_load. >> >> Although ksegment debugging message has been printed in kexec-tools, >> it is still helpful for debugging the kernel space function. > Sorry, I can't support that. We all prepare the loading segments for the > future jumping in kexec_tools if it's kexec_load interface. And calling > print_segments() to print those loading information is natural. Why do we > need print them two times for verifying if the printing is accuracy? Is it necessary to verify the user-space data after it is passed to the kernel space? > Could you explain why risc-v is special? At first, when I saw that in the RISC-V architecture, after kexec_image_info was removed from this commit eb7622d908a0 ("kexec_file, riscv: print out debugging message if required"), I thought only kexec_file_load was taken into consideration. However, without considering that kexec_load would call kexec_image_info to print segment and other debugging message, I think that since it has been deleted. Then, I referred to kexec_file_load and added these debugging message to the general code of kexec_load. In this way, all architectures can print these general debugging message. Then I can add these debugging message to the general code, so that all architectures can print these general debugging message. >>>>>>>> + >>>>>>>> ret = kimage_load_segment(image, i); >>>>>>>> if (ret) >>>>>>>> goto out; >>>>>>>> @@ -166,6 +174,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, >>>>>>>> if (ret) >>>>>>>> goto out; >>>>>>>> + kexec_dprintk("kexec_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", >>>>>>>> + image->type, image->start, image->head, flags); >>>>>>>> + >>>>>>>> /* Install the new kernel and uninstall the old */ >>>>>>>> image = xchg(dest_image, image); >>>>>>>> -- >>>>>>>> 2.20.1 >>>>>>>> > From rppt at kernel.org Wed Nov 5 09:37:12 2025 From: rppt at kernel.org (Mike Rapoport) Date: Wed, 5 Nov 2025 19:37:12 +0200 Subject: [PATCH] MAINTAINERS: add myself as a reviewer for KHO In-Reply-To: <20251105102022.18798-1-pratyush@kernel.org> References: <20251105102022.18798-1-pratyush@kernel.org> Message-ID: On Wed, Nov 05, 2025 at 11:20:19AM +0100, Pratyush Yadav wrote: > I have been reviewing most patches for KHO already, and it is easier to > spot them if I am directly in Cc. > > Signed-off-by: Pratyush Yadav Acked-by: Mike Rapoport (Microsoft) > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 8ee7cb5fe838f..3c85bb0e381fc 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -13789,6 +13789,7 @@ KEXEC HANDOVER (KHO) > M: Alexander Graf > M: Mike Rapoport > M: Pasha Tatashin > +R: Pratyush Yadav > L: kexec at lists.infradead.org > L: linux-mm at kvack.org > S: Maintained > > base-commit: d25eefc46daf21bd1ebbc699f0ffd7fe11d92296 > -- > 2.47.3 > -- Sincerely yours, Mike. From rppt at kernel.org Wed Nov 5 09:39:09 2025 From: rppt at kernel.org (Mike Rapoport) Date: Wed, 5 Nov 2025 19:39:09 +0200 Subject: [PATCH] MAINTAINERS: extend file entry in KHO to include subdirectories In-Reply-To: <20251104143238.119803-1-lukas.bulwahn@redhat.com> References: <20251104143238.119803-1-lukas.bulwahn@redhat.com> Message-ID: On Tue, Nov 04, 2025 at 03:32:38PM +0100, Lukas Bulwahn wrote: > From: Lukas Bulwahn > > Commit 3498209ff64e ("Documentation: add documentation for KHO") adds the > file entry for 'Documentation/core-api/kho/*'. The asterisk in the end > means that all files in kho are included, but not files in its > subdirectories below. > Hence, the files under Documentation/core-api/kho/bindings/ are not > considered part of KHO, and get_maintainers.pl does not necessarily add the > KHO maintainers to the recipients of patches to those files. Probably, this > is not intended, though, and it was simply an oversight of the detailed > semantics of such file entries. > > Make the file entry to include the subdirectories of > Documentation/core-api/kho/. > > Signed-off-by: Lukas Bulwahn Acked-by: Mike Rapoport (Microsoft) > --- > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 06ff926c5331..499b52d7793f 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -13836,7 +13836,7 @@ L: kexec at lists.infradead.org > L: linux-mm at kvack.org > S: Maintained > F: Documentation/admin-guide/mm/kho.rst > -F: Documentation/core-api/kho/* > +F: Documentation/core-api/kho/ > F: include/linux/kexec_handover.h > F: kernel/kexec_handover.c > F: tools/testing/selftests/kho/ > -- > 2.51.1 > -- Sincerely yours, Mike. From pasha.tatashin at soleen.com Wed Nov 5 12:07:25 2025 From: pasha.tatashin at soleen.com (Pasha Tatashin) Date: Wed, 5 Nov 2025 15:07:25 -0500 Subject: [PATCH] MAINTAINERS: add myself as a reviewer for KHO In-Reply-To: <20251105102022.18798-1-pratyush@kernel.org> References: <20251105102022.18798-1-pratyush@kernel.org> Message-ID: Reviewed-by: Pasha Tatashin On Wed, Nov 5, 2025 at 5:20?AM Pratyush Yadav wrote: > > I have been reviewing most patches for KHO already, and it is easier to > spot them if I am directly in Cc. > > Signed-off-by: Pratyush Yadav > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 8ee7cb5fe838f..3c85bb0e381fc 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -13789,6 +13789,7 @@ KEXEC HANDOVER (KHO) > M: Alexander Graf > M: Mike Rapoport > M: Pasha Tatashin > +R: Pratyush Yadav > L: kexec at lists.infradead.org > L: linux-mm at kvack.org > S: Maintained > > base-commit: d25eefc46daf21bd1ebbc699f0ffd7fe11d92296 > -- > 2.47.3 > From akpm at linux-foundation.org Wed Nov 5 16:14:32 2025 From: akpm at linux-foundation.org (Andrew Morton) Date: Wed, 5 Nov 2025 16:14:32 -0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: <20251105130922.13321-2-piliu@redhat.com> References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com> Message-ID: <20251105161432.98eb69f87f30627a9067e78e@linux-foundation.org> On Wed, 5 Nov 2025 21:09:22 +0800 Pingfan Liu wrote: > When I tested kexec with the latest kernel, I ran into the following warning: > > [ 40.712410] ------------[ cut here ]------------ > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > [...] > [ 40.816047] Call trace: > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > [...] > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > This is caused by the fact that kexec allocates the destination directly > in the CMA area. In that case, the CMA kernel address should be exported > directly to the IMA component, instead of using the vmalloc'd address. This is something we should backport into tearlier kernels. > Signed-off-by: Pingfan Liu > Cc: Andrew Morton > Cc: Baoquan He > Cc: Alexander Graf > Cc: Steven Chen > Cc: linux-integrity at vger.kernel.org > To: kexec at lists.infradead.org So I'm thinking we should add Fixes: 0091d9241ea2 ("kexec: define functions to map and unmap segments") Cc: yes? From piliu at redhat.com Wed Nov 5 17:15:28 2025 From: piliu at redhat.com (Pingfan Liu) Date: Thu, 6 Nov 2025 09:15:28 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: <20251105161432.98eb69f87f30627a9067e78e@linux-foundation.org> References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com> <20251105161432.98eb69f87f30627a9067e78e@linux-foundation.org> Message-ID: On Thu, Nov 6, 2025 at 8:14?AM Andrew Morton wrote: > > On Wed, 5 Nov 2025 21:09:22 +0800 Pingfan Liu wrote: > > > When I tested kexec with the latest kernel, I ran into the following warning: > > > > [ 40.712410] ------------[ cut here ]------------ > > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > > [...] > > [ 40.816047] Call trace: > > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > > [...] > > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > > > This is caused by the fact that kexec allocates the destination directly > > in the CMA area. In that case, the CMA kernel address should be exported > > directly to the IMA component, instead of using the vmalloc'd address. > > This is something we should backport into tearlier kernels. > > > Signed-off-by: Pingfan Liu > > Cc: Andrew Morton > > Cc: Baoquan He > > Cc: Alexander Graf > > Cc: Steven Chen > > Cc: linux-integrity at vger.kernel.org > > To: kexec at lists.infradead.org > > So I'm thinking we should add > > Fixes: 0091d9241ea2 ("kexec: define functions to map and unmap segments") > Cc: > > yes? > Yes, it should be. Thanks for your help! Best Regards, Pingfan From bhe at redhat.com Wed Nov 5 18:03:56 2025 From: bhe at redhat.com (Baoquan He) Date: Thu, 6 Nov 2025 10:03:56 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: <20251105130922.13321-2-piliu@redhat.com> References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com> Message-ID: Hi Pingfan, On 11/05/25 at 09:09pm, Pingfan Liu wrote: > When I tested kexec with the latest kernel, I ran into the following warning: > > [ 40.712410] ------------[ cut here ]------------ > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > [...] > [ 40.816047] Call trace: > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > [...] > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > This is caused by the fact that kexec allocates the destination directly > in the CMA area. In that case, the CMA kernel address should be exported > directly to the IMA component, instead of using the vmalloc'd address. > > Signed-off-by: Pingfan Liu > Cc: Andrew Morton > Cc: Baoquan He > Cc: Alexander Graf > Cc: Steven Chen > Cc: linux-integrity at vger.kernel.org > To: kexec at lists.infradead.org > --- > kernel/kexec_core.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 9a1966207041..abe40286a02c 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -967,6 +967,7 @@ void *kimage_map_segment(struct kimage *image, int idx) > kimage_entry_t *ptr, entry; > struct page **src_pages; > unsigned int npages; > + struct page *cma; > void *vaddr = NULL; > int i; > > @@ -974,6 +975,9 @@ void *kimage_map_segment(struct kimage *image, int idx) > size = image->segment[idx].memsz; > eaddr = addr + size; > > + cma = image->segment_cma[idx]; Thanks for your fix. But I totally can't get what you are doing. The idx passed into kimage_map_segment() could index image->segment[], and can index image->segment_cma[], could you reconsider and make the code more reasonable? > + if (cma) > + return cma; > /* > * Collect the source pages and map them in a contiguous VA range. > */ > @@ -1014,7 +1018,8 @@ void *kimage_map_segment(struct kimage *image, int idx) > > void kimage_unmap_segment(void *segment_buffer) > { > - vunmap(segment_buffer); > + if (is_vmalloc_addr(segment_buffer)) > + vunmap(segment_buffer); > } > > struct kexec_load_limit { > -- > 2.49.0 > From piliu at redhat.com Wed Nov 5 18:33:17 2025 From: piliu at redhat.com (Pingfan Liu) Date: Thu, 6 Nov 2025 10:33:17 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com> Message-ID: Hi Baoquan, Thanks for your review. Please see the comment below. On Thu, Nov 6, 2025 at 10:04?AM Baoquan He wrote: > > Hi Pingfan, > > On 11/05/25 at 09:09pm, Pingfan Liu wrote: > > When I tested kexec with the latest kernel, I ran into the following warning: > > > > [ 40.712410] ------------[ cut here ]------------ > > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > > [...] > > [ 40.816047] Call trace: > > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > > [...] > > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > > > This is caused by the fact that kexec allocates the destination directly > > in the CMA area. In that case, the CMA kernel address should be exported > > directly to the IMA component, instead of using the vmalloc'd address. > > > > Signed-off-by: Pingfan Liu > > Cc: Andrew Morton > > Cc: Baoquan He > > Cc: Alexander Graf > > Cc: Steven Chen > > Cc: linux-integrity at vger.kernel.org > > To: kexec at lists.infradead.org > > --- > > kernel/kexec_core.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > index 9a1966207041..abe40286a02c 100644 > > --- a/kernel/kexec_core.c > > +++ b/kernel/kexec_core.c > > @@ -967,6 +967,7 @@ void *kimage_map_segment(struct kimage *image, int idx) > > kimage_entry_t *ptr, entry; > > struct page **src_pages; > > unsigned int npages; > > + struct page *cma; > > void *vaddr = NULL; > > int i; > > > > @@ -974,6 +975,9 @@ void *kimage_map_segment(struct kimage *image, int idx) > > size = image->segment[idx].memsz; > > eaddr = addr + size; > > > > + cma = image->segment_cma[idx]; > > Thanks for your fix. But I totally can't get what you are doing. The idx > passed into kimage_map_segment() could index image->segment[], and can > index image->segment_cma[], could you reconsider and make the code more > reasonable? > Since idx can index both image->segment[] and segment_cma[], the behavior differs based on whether segment_cma[idx] is NULL: - If segment_cma[idx] is not NULL, it points directly to the final target location, eliminating the need for data copying that traditional kexec relocation requires. - If segment_cma[idx] is NULL, the segment relies on the traditional kexec relocation code to copy its data. Thanks, Pingfan > > + if (cma) > > + return cma; > > /* > > * Collect the source pages and map them in a contiguous VA range. > > */ > > @@ -1014,7 +1018,8 @@ void *kimage_map_segment(struct kimage *image, int idx) > > > > void kimage_unmap_segment(void *segment_buffer) > > { > > - vunmap(segment_buffer); > > + if (is_vmalloc_addr(segment_buffer)) > > + vunmap(segment_buffer); > > } > > > > struct kexec_load_limit { > > -- > > 2.49.0 > > > From piliu at redhat.com Wed Nov 5 18:57:33 2025 From: piliu at redhat.com (Pingfan Liu) Date: Thu, 6 Nov 2025 10:57:33 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: <20251105161432.98eb69f87f30627a9067e78e@linux-foundation.org> References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com> <20251105161432.98eb69f87f30627a9067e78e@linux-foundation.org> Message-ID: Hi Andrew, Thanks for your help, but on second thought, I think the Fixes commit is wrong. On Thu, Nov 6, 2025 at 8:14?AM Andrew Morton wrote: > > On Wed, 5 Nov 2025 21:09:22 +0800 Pingfan Liu wrote: > > > When I tested kexec with the latest kernel, I ran into the following warning: > > > > [ 40.712410] ------------[ cut here ]------------ > > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > > [...] > > [ 40.816047] Call trace: > > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > > [...] > > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > > > This is caused by the fact that kexec allocates the destination directly > > in the CMA area. In that case, the CMA kernel address should be exported > > directly to the IMA component, instead of using the vmalloc'd address. > > This is something we should backport into tearlier kernels. > > > Signed-off-by: Pingfan Liu > > Cc: Andrew Morton > > Cc: Baoquan He > > Cc: Alexander Graf > > Cc: Steven Chen > > Cc: linux-integrity at vger.kernel.org > > To: kexec at lists.infradead.org > > So I'm thinking we should add > > Fixes: 0091d9241ea2 ("kexec: define functions to map and unmap segments") Should be: Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation") Because 07d24902977e came after 0091d9241ea2 and introduced this issue. Thanks, Pingfan > Cc: > > yes? > From bhe at redhat.com Wed Nov 5 19:21:54 2025 From: bhe at redhat.com (Baoquan He) Date: Thu, 6 Nov 2025 11:21:54 +0800 Subject: [PATCH 2/2] kernel/kexec: Fix IMA when allocation happens in CMA area In-Reply-To: References: <20251105130922.13321-1-piliu@redhat.com> <20251105130922.13321-2-piliu@redhat.com>

Message-ID: On 11/06/25 at 10:33am, Pingfan Liu wrote: > Hi Baoquan, > > Thanks for your review. Please see the comment below. > > On Thu, Nov 6, 2025 at 10:04?AM Baoquan He wrote: > > > > Hi Pingfan, > > > > On 11/05/25 at 09:09pm, Pingfan Liu wrote: > > > When I tested kexec with the latest kernel, I ran into the following warning: > > > > > > [ 40.712410] ------------[ cut here ]------------ > > > [ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198 > > > [...] > > > [ 40.816047] Call trace: > > > [ 40.818498] kimage_map_segment+0x144/0x198 (P) > > > [ 40.823221] ima_kexec_post_load+0x58/0xc0 > > > [ 40.827246] __do_sys_kexec_file_load+0x29c/0x368 > > > [...] > > > [ 40.855423] ---[ end trace 0000000000000000 ]--- > > > > > > This is caused by the fact that kexec allocates the destination directly > > > in the CMA area. In that case, the CMA kernel address should be exported > > > directly to the IMA component, instead of using the vmalloc'd address. > > > > > > Signed-off-by: Pingfan Liu > > > Cc: Andrew Morton > > > Cc: Baoquan He > > > Cc: Alexander Graf > > > Cc: Steven Chen > > > Cc: linux-integrity at vger.kernel.org > > > To: kexec at lists.infradead.org > > > --- > > > kernel/kexec_core.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > > index 9a1966207041..abe40286a02c 100644 > > > --- a/kernel/kexec_core.c > > > +++ b/kernel/kexec_core.c > > > @@ -967,6 +967,7 @@ void *kimage_map_segment(struct kimage *image, int idx) > > > kimage_entry_t *ptr, entry; > > > struct page **src_pages; > > > unsigned int npages; > > > + struct page *cma; > > > void *vaddr = NULL; > > > int i; > > > > > > @@ -974,6 +975,9 @@ void *kimage_map_segment(struct kimage *image, int idx) > > > size = image->segment[idx].memsz; > > > eaddr = addr + size; > > > > > > + cma = image->segment_cma[idx]; > > > > Thanks for your fix. But I totally can't get what you are doing. The idx > > passed into kimage_map_segment() could index image->segment[], and can > > index image->segment_cma[], could you reconsider and make the code more > > reasonable? > > > > Since idx can index both image->segment[] and segment_cma[], the > behavior differs based on whether segment_cma[idx] is NULL: > > - If segment_cma[idx] is not NULL, it points directly to the final > target location, eliminating the need for data copying that > traditional kexec relocation requires. > - If segment_cma[idx] is NULL, the segment relies on the traditional > kexec relocation code to copy its data. I see, thanks. While image->segment_cma[idx] records the struct page of the relevant cma area, but not virtual address. Is it OK for IMA later to update? ima_kexec_buffer is supposed to be a virtual address, wondering how IMA behaved in this case. From sourabhjain at linux.ibm.com Wed Nov 5 20:51:02 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Thu, 6 Nov 2025 10:21:02 +0530 Subject: [PATCH v2 0/5] kexec: reorganize sysfs interface and add new kexec sysfs Message-ID: <20251106045107.17813-1-sourabhjain@linux.ibm.com> All existing kexec and kdump sysfs entries are moved to a new location, /sys/kernel/kexec, to keep /sys/kernel/ clean and better organized. Symlinks are created at the old locations for backward compatibility and can be removed in the future [02/05]. While doing this cleanup, missing ABI documentation for the old sysfs interfaces is added, and those entries are marked as deprecated [01/05 and 03/05]. New ABI documentation is also added for the reorganized interfaces. [04/05] Along with this reorganization, a new sysfs file, /sys/kernel/kexec/crash_cma_ranges, is introduced to export crashkernel CMA reservation details to user space [05/05]. This helps tools determine the total crashkernel reserved memory and warn users that capturing user pages while CMA is reserved may cause incomplete or unreliable dumps. Cc: Aditya Gupta Cc: Andrew Morton Cc: Baoquan he Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Cc: linuxppc-dev at lists.ozlabs.org Cc: kexec at lists.infradead.org Sourabh Jain (5): Documentation/ABI: add kexec and kdump sysfs interface kexec: move sysfs entries to /sys/kernel/kexec Documentation/ABI: mark old kexec sysfs deprecated kexec: document new kexec and kdump sysfs ABIs crash: export crashkernel CMA reservation to userspace .../ABI/obsolete/sysfs-kernel-kexec-kdump | 59 +++++++++ .../ABI/testing/sysfs-kernel-kexec-kdump | 61 +++++++++ kernel/kexec_core.c | 118 ++++++++++++++++++ kernel/ksysfs.c | 68 +--------- 4 files changed, 239 insertions(+), 67 deletions(-) create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-kexec-kdump create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump -- 2.51.0 From sourabhjain at linux.ibm.com Wed Nov 5 20:51:03 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Thu, 6 Nov 2025 10:21:03 +0530 Subject: [PATCH v2 1/5] Documentation/ABI: add kexec and kdump sysfs interface In-Reply-To: <20251106045107.17813-1-sourabhjain@linux.ibm.com> References: <20251106045107.17813-1-sourabhjain@linux.ibm.com> Message-ID: <20251106045107.17813-2-sourabhjain@linux.ibm.com> Add an ABI document for following kexec and kdump sysfs interface: - /sys/kernel/kexec_loaded - /sys/kernel/kexec_crash_loaded - /sys/kernel/kexec_crash_size - /sys/kernel/crash_elfcorehdr_size Cc: Aditya Gupta Cc: Andrew Morton Cc: Baoquan he Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Cc: linuxppc-dev at lists.ozlabs.org Cc: kexec at lists.infradead.org Signed-off-by: Sourabh Jain --- .../ABI/testing/sysfs-kernel-kexec-kdump | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump new file mode 100644 index 000000000000..96b24565b68e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -0,0 +1,43 @@ +What: /sys/kernel/kexec_loaded +Date: Jun 2006 +Contact: kexec at lists.infradead.org +Description: read only + Indicates whether a new kernel image has been loaded + into memory using the kexec system call. It shows 1 if + a kexec image is present and ready to boot, or 0 if none + is loaded. +User: kexec tools, kdump service + +What: /sys/kernel/kexec_crash_loaded +Date: Jun 2006 +Contact: kexec at lists.infradead.org +Description: read only + Indicates whether a crash (kdump) kernel is currently + loaded into memory. It shows 1 if a crash kernel has been + successfully loaded for panic handling, or 0 if no crash + kernel is present. +User: Kexec tools, Kdump service + +What: /sys/kernel/kexec_crash_size +Date: Dec 2009 +Contact: kexec at lists.infradead.org +Description: read/write + Shows the amount of memory reserved for loading the crash + (kdump) kernel. It reports the size, in bytes, of the + crash kernel area defined by the crashkernel= parameter. + This interface also allows reducing the crashkernel + reservation by writing a smaller value, and the reclaimed + space is added back to the system RAM. +User: Kdump service + +What: /sys/kernel/crash_elfcorehdr_size +Date: Aug 2023 +Contact: kexec at lists.infradead.org +Description: read only + Indicates the preferred size of the memory buffer for the + ELF core header used by the crash (kdump) kernel. It defines + how much space is needed to hold metadata about the crashed + system, including CPU and memory information. This information + is used by the user space utility kexec to support updating the + in-kernel kdump image during hotplug operations. +User: Kexec tools -- 2.51.0 From sourabhjain at linux.ibm.com Wed Nov 5 20:51:04 2025 From: sourabhjain at linux.ibm.com (Sourabh Jain) Date: Thu, 6 Nov 2025 10:21:04 +0530 Subject: [PATCH v2 2/5] kexec: move sysfs entries to /sys/kernel/kexec In-Reply-To: <20251106045107.17813-1-sourabhjain@linux.ibm.com> References: <20251106045107.17813-1-sourabhjain@linux.ibm.com> Message-ID: <20251106045107.17813-3-sourabhjain@linux.ibm.com> Several kexec and kdump sysfs entries are currently placed directly under /sys/kernel/, which clutters the directory and makes it harder to identify unrelated entries. To improve organization and readability, these entries are now moved under a dedicated directory, /sys/kernel/kexec. For backward compatibility, symlinks are created at the old locations so that existing tools and scripts continue to work. These symlinks can be removed in the future once users have switched to the new path. While creating symlinks, entries are added in /sys/kernel/ that point to their new locations under /sys/kernel/kexec/. If an error occurs while adding a symlink, it is logged but does not stop initialization of the remaining kexec sysfs symlinks. The /sys/kernel/ entry is now controlled by CONFIG_CRASH_DUMP instead of CONFIG_VMCORE_INFO, as CONFIG_CRASH_DUMP also enables CONFIG_VMCORE_INFO. Cc: Aditya Gupta Cc: Andrew Morton Cc: Baoquan he Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac