[PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

Doug Anderson dianders at chromium.org
Tue Jun 25 16:08:13 PDT 2024


Hi,

On Mon, Jun 24, 2024 at 6:55 AM Will Deacon <will at kernel.org> wrote:
>
> On Fri, May 17, 2024 at 01:01:58PM -0700, Doug Anderson wrote:
> > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <dianders at chromium.org> wrote:
> > >         local_irq_disable();
> >
> > The above local_irq_disable() is not new for my patch but it seems
> > wonky for two reasons:
> >
> > 1. It feels like it should have been the first thing in the function.
> >
> > 2. It feels like it should be local_daif_mask() instead.
>
> Is that to ensure we don't take a pNMI? I think that makes sense, but
> let's please add a comment to say why local_irq_disable() is not
> sufficient.

Right, that was my thought. Mostly I realized it was right because the
normal (non-crash) stop case calls local_cpu_stop() which calls
local_daif_mask(). I was comparing the two and trying to figure out if
the difference was on purpose or an oversight. Looks like an oversight
to me.

Sure, I'll add a comment.


Ironically, looking at the code again I found _yet another_ corner
case I missed: panic_smp_self_stop(). If a CPU hits that case then we
could end up waiting for it when it's already stopped itself. I tried
to figure out how to solve that properly and it dawned on me that
maybe I should rethink part of my patch. Specifically, I had added a
new `stop_mask` in this patch because the panic case didn't update
`cpu_online_mask`. ...but that's easy enough to fix: just add a call
to `set_cpu_online(cpu, false)` in ipi_cpu_crash_stop(). ...so I'll do
that and avoid adding a new mask. If there's some reason why crash
stop shouldn't be marking a CPU offline then let me know and I'll go
back...

-Doug



More information about the linux-arm-kernel mailing list