[External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Radim Krčmář
rkrcmar at ventanamicro.com
Wed Dec 10 06:22:29 PST 2025
2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui at bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar at ventanamicro.com> wrote:
>>
>> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui at bytedance.com>:
>> > Hi Radim,
>> >
>> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar at ventanamicro.com> wrote:
>> >>
>> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui at bytedance.com>:
>> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >> >
>> >> > Signed-off-by: Yunhui Cui <cuiyunhui at bytedance.com>
>> >> > ---
>> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> >> > type = atomic_read(this_cpu_ptr(&local_nmi));
>> >> >
>> >> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> >> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>> >>
>> >> Please document the intended preemption design for all SSE events,
>> >> because it will be a nightmare if we forget some assumptions in the
>> >> coming years. (That includes the relative priorities of RAS/PMU/...)
>> >
>> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
>> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
>> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
>> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>>
>> That is how it is. I don't understand why it must be like that.
>>
>> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
>> it will currently interrupt NMI_CRASH as they both have priority 0,
>> although NMI_CRASH probably shouldn't be masked by anything, and should
>> preempt everything.
>> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
>> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>>
>> The issues can be avoided by event priorities, masking, or deemed as
>> non-issue, but I think it would be beneficial to provide some reasoning
>> behind the design, as the choices don't seem obvious to me.
>
> Indeed, it is necessary to consider the priority among different
> events. Should different priorities also be assigned to NMI_CRASH,
> NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?
I think it would be beneficial to document the desired behavior even if
we can't (currently?) implement it, because like you said, SSE can't
directly express the priority when multiplexing SOFTWARE_INJECTED.
> Do these operations need to be
> visible to the BIOS?
BIOS shouldn't care what lower privilege wants to do.
SBI could define more events for software use, though.
> Could you kindly provide some good suggestions?
I think it would be good practice to explicitly set a unique priority
when registering SSE events. Maybe through a global priority enum, and
make sure that all event registrations are passing a value from that
enum.
That would make sure that different events interact like we expect them
to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.
If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
importance handlers (e.g. CRASH) would eventually run.
We're dealing with low-occurrence scenarios, so this might be "good
enough for now"...
Situation would get simpler if we could avoid some sub-handlers;
alternatively, it would get more complicated if SOFTWARE_INJECTED had
lower priority than some other event -- we'd make CRASH partially
recover its high priority image by masking other SSE events during its
execution (and we'd need warding amulets against hangs and starvation).
Thanks.
More information about the linux-arm-kernel
mailing list