[PATCH v13 04/15] arm64: kexec_file: Fix potential buffer overflow in prepare_elf_headers()
Jinjie Ruan
ruanjinjie at huawei.com
Tue May 19 05:42:04 PDT 2026
On 5/11/2026 5:46 PM, Breno Leitao wrote:
> On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote:
>> There is a race condition between the kexec_load() system call
>> (crash kernel loading path) and memory hotplug operations that can
>> lead to buffer overflow and potential kernel crash.
>>
>> During prepare_elf_headers(), the following steps occur:
>> 1. The first for_each_mem_range() queries current System RAM memory ranges
>> 2. Allocates buffer based on queried count
>> 3. The 2st for_each_mem_range() populates ranges from memblock
>>
>> If memory hotplug occurs between step 1 and step 3, the number of ranges
>> can increase, causing out-of-bounds write when populating cmem->ranges[].
>>
>> This happens because kexec_load() uses kexec_trylock (atomic_t) while
>> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize
>> with each other.
>>
>> Add the explicit bounds checking to prevent out-of-bounds access.
>
> It seems you have a TOCTOU type of issue, and this seems to be shrinking
> the window, but not fully solving it?
I plan to fix this issue as follows, and would appreciate your feedback
on whether this is reasonable.
Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.
To resolve this and ensure data consistency, this patch:
1. Wraps the counting and population passes with get_online_mems() and
crash_hotplug_lock(). This serializes the kexec_file_load() path
with concurrent memory hotplug operations, ensuring the memory
map remains consistent throughout the header preparation.
2. Adds an explicit boundary check in prepare_elf64_ram_headers_callback().
If the number of ranges exceeds the allocated maximum, it now returns
-EAGAIN, which indicates a transient race, signaling userspace
kexec-tools to retry the syscall instead of leaving the system
without a loaded crash kernel.
index daf81a873bbd..546be6261177 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -15,6 +15,7 @@
#include <linux/kexec.h>
#include <linux/libfdt.h>
#include <linux/memblock.h>
+#include <linux/memory_hotplug.h>
#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/slab.h>
@@ -40,7 +41,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage
*image)
}
#ifdef CONFIG_CRASH_DUMP
-int prepare_elf_headers(void **addr, unsigned long *sz)
+static int __prepare_elf_headers(void **addr, unsigned long *sz)
{
struct crash_mem *cmem;
unsigned int nr_ranges;
@@ -59,6 +60,11 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
cmem->max_nr_ranges = nr_ranges;
cmem->nr_ranges = 0;
for_each_mem_range(i, &start, &end) {
+ if (cmem->nr_ranges >= cmem->max_nr_ranges) {
+ ret = -EAGAIN;
+ goto out;
+ }
+
cmem->ranges[cmem->nr_ranges].start = start;
cmem->ranges[cmem->nr_ranges].end = end - 1;
cmem->nr_ranges++;
@@ -81,6 +87,21 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
kfree(cmem);
return ret;
}
+
+int prepare_elf_headers(void **addr, unsigned long *sz)
+{
+ int ret;
+
+ crash_hotplug_lock();
+ get_online_mems();
+
+ ret = __prepare_elf_headers(addr, sz);
+
+ put_online_mems();
+ crash_hotplug_unlock();
+
+ return ret;
+}
#endif
>
>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>> Cc: Will Deacon <will.deacon at arm.com>
>> Cc: Andrew Morton <akpm at linux-foundation.org>
>> Cc: Baoquan He <bhe at redhat.com>
>> Cc: Breno Leitao <leitao at debian.org>
>> Cc: stable at vger.kernel.org
>> Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support")
>> Closes: https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com
>> Signed-off-by: Jinjie Ruan <ruanjinjie at huawei.com>
>> ---
>> arch/arm64/kernel/machine_kexec_file.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index e31fabed378a..a67e7b1abbab 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -59,6 +59,11 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>> cmem->max_nr_ranges = nr_ranges;
>> cmem->nr_ranges = 0;
>> for_each_mem_range(i, &start, &end) {
>> + if (cmem->nr_ranges >= cmem->max_nr_ranges) {
>> + ret = -ENOMEM;
>
> -ENOMEM seems to be the the wrong errno. This isn't an allocation
> failure; it's a transient race. -EBUSY or -EAGAIN would be more honest
More information about the linux-riscv
mailing list