[PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort

Tue Jun 16 14:38:16 PDT 2026

Hi,

On Sun, Jun 14, 2026 at 7:36 PM Kiryl Shutsemau <kirill at shutemov.name> wrote:
>
> +/*
> + * Bring the local CPU to a stop, saving its register state into the vmcore
> + * on the kdump crash path first. The single point every arm64 stop path
> + * funnels through, so the bookkeeping (mask interrupts, mark offline, mask
> + * SDEI, optionally power off) lives in one place:
> + *
> + *   - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
> + *   - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
> + *   - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
> + *     which reaches CPUs the stop IPIs could not.
> + *
> + * @regs is the register state to record in the vmcore on a crash stop; NULL
> + * means "capture the current context". @die_on_crash decides the kdump crash
> + * path: the IPI stop handlers pass true and power the CPU off (PSCI CPU_OFF,
> + * via __cpu_try_die()) so a capture kernel can reclaim it. The SDEI handler
> + * and panic_smp_self_stop() pass false and only park. For SDEI that is
> + * required, not just conservative: it runs inside an SDEI event that is
> + * deliberately never completed (completing it has firmware resume the wedged
> + * context), and a CPU_OFF from that not-yet-completed context wedges EL3 on
> + * some firmware -- a documented follow-up. Parking also matches this path's
> + * own fallback when CPU_OFF is unavailable.

Nice to have all the details in the function comment. Any reason why
you didn't use kernel-doc format? Nothing else in this file does, I
guess, but it doesn't seem like it would be a problem to start the
trend... ;-)

> @@ -59,8 +64,51 @@ static bool sdei_nmi_available;
>
>  #define SDEI_NMI_EVENT                 0
>
> +/*
> + * Backtrace and stop both ride SDEI event 0. That is not a chosen economy:
> + * event 0 is the only architecturally software-signalled event -- the sole
> + * event SDEI_EVENT_SIGNAL can target at an arbitrary PE. Every other event
> + * number is a firmware/platform interrupt-bound event, not something the
> + * kernel can raise cross-CPU, so a dedicated "stop" event would need
> + * firmware to define and bind it -- exactly the firmware dependency this
> + * driver sets out to avoid.
> + *
> + * Sharing one event means the handler must tell a stop apart from a
> + * backtrace. A stop is terminal and system-wide -- sdei_nmi_stop_cpus() is
> + * only reached from smp_send_stop() (reboot/halt/panic/kdump), which never
> + * returns -- so once a stop is requested, every later event-0 fire is a
> + * stop too. A single write-once flag therefore carries as much as a
> + * per-CPU mask would: sdei_nmi_stop_cpus() sets it before signalling, and
> + * the handler reads a set flag as "stop this CPU" and a clear flag as
> + * "backtrace" (handled by nmi_cpu_backtrace(), which self-gates on the
> + * framework's backtrace mask). A backtrace fire that races in after a stop
> + * has begun just stops that CPU instead -- harmless, it is going down.
> + */
> +static bool sdei_nmi_stopping;
> +
>  static int sdei_nmi_handler(u32 event, struct pt_regs *regs, void *arg)
>  {
> +       if (READ_ONCE(sdei_nmi_stopping)) {

Don't you need a smp_rmb() before that, to match with the smp_wmb()?

-Doug