arm64: BUG: KASAN: invalid-access in arch_stack_walk
Breno Leitao
leitao at debian.org
Mon Jun 23 09:56:33 PDT 2025
On Mon, Jun 23, 2025 at 12:56:06PM +0100, Catalin Marinas wrote:
> On Sun, Jun 22, 2025 at 02:57:16PM +0200, Andrey Konovalov wrote:
> > On Fri, Jun 20, 2025 at 2:33 PM Breno Leitao <leitao at debian.org> wrote:
> > > I'm encountering a KASAN warning during aarch64 boot and I am struggling
> > > to determine the cause. I haven't come across any reports about this on
> > > the mailing list so far, so I'm sharing this early in case others are
> > > seeing it too.
> > >
> > > This issue occurs both on Linus's upstream branch and in the 6.15 final
> > > release. The stack trace below is from 6.15 final. I haven't started
> > > bisecting yet, but that's my next step.
> > >
> > > Here are a few details about the problem:
> > >
> > > 1) it happen on my kernel boots on a aarch64 host
> > > 2) The lines do not match the code very well, and I am not sure why. It
> > > seems it is offset by two lines. The stack is based on commit
> > > 0ff41df1cb26 ("Linux 6.15")
> > > 3) My config is at https://pastebin.com/ye46bEK9
> > >
> > >
> > > [ 235.831690] ==================================================================
> > > [ 235.861238] BUG: KASAN: invalid-access in arch_stack_walk (arch/arm64/kernel/stacktrace.c:346 arch/arm64/kernel/stacktrace.c:387)
> > > [ 235.887206] Write of size 96 at addr a5ff80008ae8fb80 by task kworker/u288:26/3666
> > > [ 235.918139] Pointer tag: [a5], memory tag: [00]
> > > [ 235.942722] Workqueue: efi_rts_wq efi_call_rts
> > > [ 235.942732] Call trace:
> > > [ 235.942734] show_stack (arch/arm64/kernel/stacktrace.c:468) (C)
> > > [ 235.942741] dump_stack_lvl (lib/dump_stack.c:123)
> > > [ 235.942748] print_report (mm/kasan/report.c:409 mm/kasan/report.c:521)
> > > [ 235.942755] kasan_report (mm/kasan/report.c:636)
> > > [ 235.942759] kasan_check_range (mm/kasan/sw_tags.c:85)
> > > [ 235.942764] memset (mm/kasan/shadow.c:53)
> > > [ 235.942769] arch_stack_walk (arch/arm64/kernel/stacktrace.c:346 arch/arm64/kernel/stacktrace.c:387)
> > > [ 235.942773] return_address (arch/arm64/kernel/return_address.c:44)
> > > [ 235.942778] trace_hardirqs_off.part.0 (kernel/trace/trace_preemptirq.c:95)
> > > [ 235.942784] trace_hardirqs_off_finish (kernel/trace/trace_preemptirq.c:98)
> > > [ 235.942789] enter_from_kernel_mode (arch/arm64/kernel/entry-common.c:62)
> > > [ 235.942794] el1_interrupt (arch/arm64/kernel/entry-common.c:559 arch/arm64/kernel/entry-common.c:575)
> > > [ 235.942799] el1h_64_irq_handler (arch/arm64/kernel/entry-common.c:581)
> > > [ 235.942804] el1h_64_irq (arch/arm64/kernel/entry.S:596)
> > > [ 235.942809] 0x3c52ff1ecc (P)
> > > [ 235.942825] 0x3c52ff0ed4
> > > [ 235.942829] 0x3c52f902d0
> > > [ 235.942833] 0x3c52f953e8
> > > [ 235.942837] __efi_rt_asm_wrapper (arch/arm64/kernel/efi-rt-wrapper.S:49)
> > > [ 235.942843] efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:269)
> > > [ 235.942848] process_one_work (./arch/arm64/include/asm/jump_label.h:36 ./include/trace/events/workqueue.h:110 kernel/workqueue.c:3243)
> > > [ 235.942854] worker_thread (kernel/workqueue.c:3313 kernel/workqueue.c:3400)
> > > [ 235.942858] kthread (kernel/kthread.c:464)
> > > [ 235.942863] ret_from_fork (arch/arm64/kernel/entry.S:863)
> > >
> > > [ 236.436924] The buggy address belongs to the virtual mapping at
> > > [a5ff80008ae80000, a5ff80008aea0000) created by:
> > > arm64_efi_rt_init (arch/arm64/kernel/efi.c:219)
> > >
> > > [ 236.506959] The buggy address belongs to the physical page:
> > > [ 236.529724] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12682
> > > [ 236.562077] flags: 0x17fffd6c0000000(node=0|zone=2|lastcpupid=0x1ffff|kasantag=0x5b)
> > > [ 236.593722] raw: 017fffd6c0000000 0000000000000000 dead000000000122 0000000000000000
> > > [ 236.625365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > > [ 236.657004] page dumped because: kasan: bad access detected
> > >
> > > [ 236.685828] Memory state around the buggy address:
> > > [ 236.705390] ffff80008ae8f900: 00 00 00 00 00 a5 a5 a5 a5 00 00 00 00 00 a5 a5
> > > [ 236.734899] ffff80008ae8fa00: a5 a5 a5 00 00 00 00 00 00 a5 a5 a5 a5 a5 00 a5
> > > [ 236.764409] >ffff80008ae8fb00: 00 a5 a5 a5 00 a5 a5 a5 a5 a5 a5 00 a5 a5 a5 00
> > > [ 236.793918] ^
> > > [ 236.818810] ffff80008ae8fc00: a7 a5 a5 a5 a5 a5 a5 a5 a5 00 a5 00 a5 a5 a5 a5
> > > [ 236.848321] ffff80008ae8fd00: a5 a5 a5 a5 00 a5 00 a5 a5 a5 a5 a5 a5 a5 a5 a5
> > > [ 236.877828] ==================================================================
> >
> > Looks like the memory allocated/mapped in arm64_efi_rt_init() is
> > tagged by __vmalloc_node(). And this memory then gets used as a
> > (irq-related? EFI-related?) stack. And having the SP register tagged
> > breaks SW_TAGS instrumentation AFAIR [1], which is likely what
> > produces this report.
> >
> > Adding kasan_reset_tag() to arm64_efi_rt_init() should likely fix
> > this; similar to what we have in arch_alloc_vmap_stack(). Or should we
> > make arm64_efi_rt_init() just call arch_alloc_vmap_stack()?
>
> In theory, we can still disable the vmap stack, so we either fall back
> to something else or require that EFI runtime depends on VMAP_STACK.
> We can do like init_sdei_stacks(), just bail out if VMAP_STACK is
> disabled.
Thanks for the feedback and suggestions. Are we talking about a patch
that looks like the following:
Author: Breno Leitao <leitao at debian.org>
Date: Mon Jun 23 09:46:54 2025 -0700
arm64: Use arch_alloc_vmap_stack for EFI runtime stack allocation
Refactor vmap stack allocation by moving the CONFIG_VMAP_STACK check
from BUILD_BUG_ON to a runtime return of NULL if the config is not set.
The side effect of this is that _init_sdei_stack() might NOT fail in
build time if _VMAP_STACK, but in runtime. It shifts error
detection from compile-time to runtime
Then, reuse arch_alloc_vmap_stack() to allocate the ACPI stack
memory in the arm64_efi_rt_init().
Suggested-by: Andrey Konovalov <andreyknvl at gmail.com>
Suggested-by: Catalin Marinas <catalin.marinas at arm.com>
Signed-off-by: Breno Leitao <leitao at debian.org>
diff --git a/arch/arm64/include/asm/vmap_stack.h b/arch/arm64/include/asm/vmap_stack.h
index 20873099c035c..8380af4507d01 100644
--- a/arch/arm64/include/asm/vmap_stack.h
+++ b/arch/arm64/include/asm/vmap_stack.h
@@ -19,7 +19,8 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)
{
void *p;
- BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
+ if (!IS_ENABLED(CONFIG_VMAP_STACK))
+ return NULL;
p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
__builtin_return_address(0));
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 3857fd7ee8d46..6c371b158b99f 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -15,6 +15,7 @@
#include <asm/efi.h>
#include <asm/stacktrace.h>
+#include <asm/vmap_stack.h>
static bool region_is_misaligned(const efi_memory_desc_t *md)
{
@@ -214,9 +215,8 @@ static int __init arm64_efi_rt_init(void)
if (!efi_enabled(EFI_RUNTIME_SERVICES))
return 0;
- p = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, GFP_KERNEL,
- NUMA_NO_NODE, &&l);
-l: if (!p) {
+ p = arch_alloc_vmap_stack(THREAD_SIZE, NUMA_NO_NODE);
+ if (!p) {
pr_warn("Failed to allocate EFI runtime stack\n");
clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
return -ENOMEM;
More information about the linux-arm-kernel
mailing list