[PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
Ard Biesheuvel
ardb at kernel.org
Tue Jul 22 18:20:59 PDT 2025
(cc Sami again)
On Wed, 23 Jul 2025 at 01:20, Will Deacon <will at kernel.org> wrote:
>
> On Mon, Jul 21, 2025 at 09:42:23PM +0000, Prundeanu, Cristian wrote:
> > On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
> >
> > > Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
> > > Do the same in `call_on_irq_stack()`, but restore and mask around
> > > the branch.
> > > Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
> > > of behaviour between all configurations.
> > >
> > > Introduce and use an assembly macro for saving and masking DAIF,
> > > as the existing one saves but only masks IF.
> > >
> > > Signed-off-by: Ada Couprie Diaz <ada.coupriediaz at arm.com>
> > > Reported-by: Cristian Prundeanu <cpru at amazon.com>
> > > Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
> >
> > Confirming this fixes the spontaneous reboot previously observed when
> > enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
> > stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
> > legacy kernel 6.8 where the issue was initially observed.
> >
> > Tested-by: Cristian Prundeanu <cpru at amazon.com>
>
> Ah, I hadn't appreciated from the cover letter that this was fixing a
> real issue seen in the field. It all sounded a bit theoretical (but more
> likely with pNMI).
>
> I'll pick it up as a fix so we can land it in v6.16.
>
> Ard -- maybe we can rework this in future along the lines that you
> suggest but, from what Mark was saying offline, there may be problems
> beyond the SCS that need addressing too if we decide to leave IRQs
> enabled.
>
That is fine with me - the change looks alright, it's just that the
wall of text implicates the shadow call stack (and the commit in
question) in particular, even though (per Mark), the actual issues
being addressed here are more intricate and not limited to the shadow
call stack at all. E.g., the fact that call_on_irq_stack() may
re-enter up until the point where SP is assigned could have other
implications, which were present before that particular commit made
things even worse.
But if masking DAIF is needed in any case, it make sense of course to
rely on it to fix those issues too - I just think it might be better
to get rid of scs_save in call_on_irq_stack() anyway.
More information about the linux-arm-kernel
mailing list