[PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
Ard Biesheuvel
ardb at kernel.org
Mon Jul 21 22:31:44 PDT 2025
(cc Sami)
On Tue, 22 Jul 2025 at 00:18, Will Deacon <will at kernel.org> wrote:
>
> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> > `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change
> > to different stacks along with the Shadow Call Stack if it is enabled.
> > Those two stack changes cannot be done atomically and both functions
> > can be interrupted by SErrors or Debug Exceptions which, though unlikely,
> > is very much broken : if interrupted, we can end up with mismatched stacks
> > and Shadow Call Stack leading to clobbered stacks.
> >
> > In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task,
> > but x18 stills points to the old task's SCS. When the interrupt handler
> > tries to save the task's SCS pointer, it will save the old task
> > SCS pointer (x18) into the new task struct (pointed to by SP_EL0),
> > clobbering it.
> >
In this case, the 'interrupt handler' is not the entry code, but
call_on_irq_stack(), right?
> > In `call_on_irq_stack()`, it can happen when switching from the task stack
> > to the IRQ stack and when switching back. In both cases, we can be
> > interrupted when the SCS pointer points to the IRQ SCS, but SP points to
> > the task stack. The nested interrupt handler pushes its return addresses
> > on the IRQ SCS. It then detects that SP points to the task stack,
> > calls `call_on_irq_stack()` and clobbers the task SCS pointer with
> > the IRQ SCS pointer, which it will also use !
> >
> > This leads to tasks returning to addresses on the wrong SCS,
> > or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK
> > or FPAC if enabled.
> >
> > This is possible on a default config, but unlikely.
> > However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and
> > instead the GIC is responsible for filtering what interrupts the CPU
> > should receive based on priority.
> > Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU
> > even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very*
> > frequently depending on the system configuration and workload, leading
> > to unpredictable kernel panics.
> >
> > Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> > Do the same in `call_on_irq_stack()`, but restore and mask around
> > the branch.
> > Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> > of behaviour between all configurations.
> >
> > Introduce and use an assembly macro for saving and masking DAIF,
> > as the existing one saves but only masks IF.
> >
> > Signed-off-by: Ada Couprie Diaz <ada.coupriediaz at arm.com>
> > Reported-by: Cristian Prundeanu <cpru at amazon.com>
> > Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> > ---
> > Hi,
> > I spent some time evaluating the performance impact of this change
> > to make sure that it would be OK to mask in those functions.
> > They have very few instructions so have few chances to be interrupted
> > to begin with so the impact should be minimal.
>
> It's definitely worthwhile doing the benchmarking, so thanks for that,
> but realistically I don't think we have an awful lot of choice but to
> fix this, do we?
>
So AIUI, the problem is that both cpu_switch_to() and
call_on_irq_stack() itself can be interrupted by the latter, resulting
in the shadow call stack pointer recorded in the task struct to be
incorrect, either because the task struct pointer is pointing to the
wrong task, or the shadow stack pointer is pointing to the wrong
stack.
Commit 59b37fe52f49955 changed this code to ensure that the shadow
stack pointer is not preserved to/restored from the ordinary stack,
because that turns call_on_irq_stack() into a rather useful gadget but
use the task struct instead.
What we could do instead is record the caller's SCS pointer on the IRQ
shadow stack, avoiding the task struct altogether. This should be the
only occurrence of scs_save that may interrupt cpu_switch_to(), so it
should address both issues AFAICT.
The only minor issue is that loading the IRQ shadow stack pointer
before assigning SP is problematic, because when call_on_irq_stack()
is re-entered over the top of an exception that is taken at that
point, the clobbering of scs_sp will interfere with the return from
that exception, and so scs_sp should be left untouched until after SP
is assigned (which is what prevents call_on_irq_stack() from
re-entering). This creates a tiny window where we may end up servicing
a (nested) IRQ running on the IRQ stack but on the task shadow stack,
but I don't think that is a thing to obsess about tbh. (Sami?) If it
is, we'll need your DAIF changes to turn that into a proper critical
section, but I doubt whether that is necessary.
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -874,12 +874,6 @@ NOKPROBE(ret_from_fork)
* Calls func(regs) using this CPU's irq stack and shadow irq stack.
*/
SYM_FUNC_START(call_on_irq_stack)
-#ifdef CONFIG_SHADOW_CALL_STACK
- get_current_task x16
- scs_save x16
- ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
-#endif
-
/* Create a frame record to save our LR and SP (implicit in FP) */
stp x29, x30, [sp, #-16]!
mov x29, sp
@@ -888,7 +882,23 @@ SYM_FUNC_START(call_on_irq_stack)
/* Move to the new stack and call the function there */
add sp, x16, #IRQ_STACK_SIZE
+#ifdef CONFIG_SHADOW_CALL_STACK
+ mov x16, scs_sp
+ ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x17
+#ifdef CONFIG_DYNAMIC_SCS
+ cbz scs_sp, 0f
+#endif
+ str x16, [scs_sp], #8
+0:
+#endif
blr x1
+#ifdef CONFIG_SHADOW_CALL_STACK
+#ifdef CONFIG_DYNAMIC_SCS
+ cbz scs_sp, 1f
+#endif
+ ldr scs_sp, [scs_sp, #-8]
+1:
+#endif
/*
* Restore the SP from the FP, and restore the FP and LR from the
@@ -896,7 +906,6 @@ SYM_FUNC_START(call_on_irq_stack)
*/
mov sp, x29
ldp x29, x30, [sp], #16
- scs_load_current
ret
SYM_FUNC_END(call_on_irq_stack)
NOKPROBE(call_on_irq_stack)
More information about the linux-arm-kernel
mailing list