[PATCH] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()
Ada Couprie Diaz
ada.coupriediaz at arm.com
Thu Jul 24 10:35:30 PDT 2025
On 23/07/2025 02:20, Ard Biesheuvel wrote:
> (cc Sami again)
>
> On Wed, 23 Jul 2025 at 01:20, Will Deacon <will at kernel.org> wrote:
>> On Mon, Jul 21, 2025 at 09:42:23PM +0000, Prundeanu, Cristian wrote:
>>> On Fri, Jul 18, 2025 at 03:28:14PM +0100, Ada Couprie Diaz wrote:
>>>
>>>> Completely mask DAIF in `cpu_switch_to()` and restore it when returning.
>>>> Do the same in `call_on_irq_stack()`, but restore and mask around
>>>> the branch.
>>>> Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency
>>>> of behaviour between all configurations.
>>>>
>>>> Introduce and use an assembly macro for saving and masking DAIF,
>>>> as the existing one saves but only masks IF.
>>>>
>>>> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz at arm.com>
>>>> Reported-by: Cristian Prundeanu <cpru at amazon.com>
>>>> Fixes: 59b37fe52f49955791a460752c37145f1afdcad1 ("arm64: Stash shadow stack pointer in the task struct on interrupt")
>>> Confirming this fixes the spontaneous reboot previously observed when
>>> enabling both pseudo-NMI (irqchip.gicv3_pseudo_nmi=1) and shadow call
>>> stack (CONFIG_SHADOW_CALL_STACK=y). Tested both on kernel 6.16-rc7 and
>>> legacy kernel 6.8 where the issue was initially observed.
>>>
>>> Tested-by: Cristian Prundeanu <cpru at amazon.com>
>> Ah, I hadn't appreciated from the cover letter that this was fixing a
>> real issue seen in the field. It all sounded a bit theoretical (but more
>> likely with pNMI).
I can see how : I detailed the mechanism but not really mentioned either
a reproducer or the originally observed issues, on top of mentioning pNMI
after all the "possible but unlikely" explanation. That's my bad !
I'll probably lead with the observed issues rather than the explanation
of their reason in the future.
>> I'll pick it up as a fix so we can land it in v6.16.
>>
>> Ard -- maybe we can rework this in future along the lines that you
>> suggest but, from what Mark was saying offline, there may be problems
>> beyond the SCS that need addressing too if we decide to leave IRQs
>> enabled.
>>
> That is fine with me - the change looks alright, it's just that the
> wall of text implicates the shadow call stack (and the commit in
> question) in particular, even though (per Mark), the actual issues
> being addressed here are more intricate and not limited to the shadow
> call stack at all. E.g., the fact that call_on_irq_stack() may
> re-enter up until the point where SP is assigned could have other
> implications, which were present before that particular commit made
> things even worse.
To be honest, that was indeed my understanding of the issue
and how I approached it. Given offline discussions, there's other things
that having interrupts masked seems to help/fix. Knowing that now
I agree that the commit message is not great.
> But if masking DAIF is needed in any case, it make sense of course to
> rely on it to fix those issues too - I just think it might be better
> to get rid of scs_save in call_on_irq_stack() anyway.
I think that makes sense, and with interrupts masked would probably
prevent the issue you mentioned in the other thread if I understand
correctly.
Thanks all, sorry for the confusion of the commit message.
Ada
More information about the linux-arm-kernel
mailing list