[PATCH v3 3/3] arm64: escalate smp_send_stop() to an SDEI NMI as a last resort

Kiryl Shutsemau kirill at shutemov.name
Wed Jun 17 08:07:08 PDT 2026


On Tue, Jun 16, 2026 at 02:38:16PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Sun, Jun 14, 2026 at 7:36 PM Kiryl Shutsemau <kirill at shutemov.name> wrote:
> >
> > +/*
> > + * Bring the local CPU to a stop, saving its register state into the vmcore
> > + * on the kdump crash path first. The single point every arm64 stop path
> > + * funnels through, so the bookkeeping (mask interrupts, mark offline, mask
> > + * SDEI, optionally power off) lives in one place:
> > + *
> > + *   - the regular IPI_CPU_STOP and pseudo-NMI IPI_CPU_STOP_NMI handlers;
> > + *   - panic_smp_self_stop(), a CPU parking itself on a parallel panic();
> > + *   - the SDEI cross-CPU NMI handler (drivers/firmware/arm_sdei_nmi.c),
> > + *     which reaches CPUs the stop IPIs could not.
> > + *
> > + * @regs is the register state to record in the vmcore on a crash stop; NULL
> > + * means "capture the current context". @die_on_crash decides the kdump crash
> > + * path: the IPI stop handlers pass true and power the CPU off (PSCI CPU_OFF,
> > + * via __cpu_try_die()) so a capture kernel can reclaim it. The SDEI handler
> > + * and panic_smp_self_stop() pass false and only park. For SDEI that is
> > + * required, not just conservative: it runs inside an SDEI event that is
> > + * deliberately never completed (completing it has firmware resume the wedged
> > + * context), and a CPU_OFF from that not-yet-completed context wedges EL3 on
> > + * some firmware -- a documented follow-up. Parking also matches this path's
> > + * own fallback when CPU_OFF is unavailable.
> 
> Nice to have all the details in the function comment. Any reason why
> you didn't use kernel-doc format? Nothing else in this file does, I
> guess, but it doesn't seem like it would be a problem to start the
> trend... ;-)

No reason -- switched it to kernel-doc in v4.

> > +       if (READ_ONCE(sdei_nmi_stopping)) {
>
> Don't you need a smp_rmb() before that, to match with the smp_wmb()?

No -- there's nothing for it to pair against. An smp_rmb() orders a
flag-load before a later *data*-load (the message-passing pattern), but
the flag is the only value shared here: the handler reads
sdei_nmi_stopping and nothing else published through it. crash_stop,
which the stop path reads afterwards, was written before the flag, so
reordering the reads can't expose a stale value either.

And the handler isn't spinning on the flag: it runs only because firmware
delivered the SDEI event, which is a full round-trip (SMC -> EL3 -> GIC ->
exception entry) strictly after sdei_nmi_stop_cpus()'s
WRITE_ONCE() + smp_wmb() + signal. By the time the read executes the store
is already globally visible -- there's no window for a stale read. The
publish-before-signal barrier is the half that matters.

The IPI core does exactly this. gic_ipi_send_mask()
(drivers/irqchip/irq-gic-v3.c) issues dsb(ishst) -- "stores to Normal
memory ... visible to the other CPUs before issuing the IPI" -- before
sending the SGI, and no IPI handler takes a matching read-side barrier. I
use an explicit smp_wmb() only because the SDEI signal goes through an
SMC, which carries no such barrier, where gic_ipi_send_mask() bakes it
into the SGI send. The backtrace framework this driver plugs into is the
same shape: nmi_cpu_backtrace() reads backtrace_mask -- set by
nmi_trigger_cpumask_backtrace() before raise() -- with no smp_rmb().

A bare smp_rmb() with no second load to order would just be confusing, so
v4 adds a comment at the READ_ONCE() explaining why there isn't one.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov



More information about the kexec mailing list