[PATCH v3] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
Doug Anderson
dianders at chromium.org
Fri Aug 23 08:29:38 PDT 2024
Hi Will,
On Fri, Aug 23, 2024 at 3:46 AM Will Deacon <will at kernel.org> wrote:
>
> Hi Doug,
>
> On Wed, Aug 21, 2024 at 02:53:57PM -0700, Douglas Anderson wrote:
> > When testing hard lockup handling on my sc7180-trogdor-lazor device
> > with pseudo-NMI enabled, with serial console enabled and with kgdb
> > disabled, I found that the stack crawls printed to the serial console
> > ended up as a jumbled mess. After rebooting, the pstore-based console
> > looked fine though. Also, enabling kgdb to trap the panic made the
> > console look fine and avoided the mess.
>
> Just a small nit:
>
> > while (num_other_online_cpus() && timeout--)
> > udelay(1);
> >
> > - if (num_other_online_cpus())
> > + /*
> > + * If CPUs are still online, try an NMI. There's no excuse for this to
> > + * be slow, so we only give them an extra 10 ms to respond.
> > + */
> > + if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) {
>
> We probably want an smp_rmb() here...
>
> > + cpumask_copy(&mask, cpu_online_mask);
> > + cpumask_clear_cpu(smp_processor_id(), &mask);
> > +
> > + pr_info("SMP: retry stop with NMI for CPUs %*pbl\n",
> > + cpumask_pr_args(&mask));
> > +
> > + smp_cross_call(&mask, IPI_CPU_STOP_NMI);
> > + timeout = USEC_PER_MSEC * 10;
> > + while (num_other_online_cpus() && timeout--)
> > + udelay(1);
> > + }
> > +
> > + if (num_other_online_cpus()) {
>
>
> ... and again here, just to make sure that the re-read of cpu_online_mask
> is ordered after the read of __num_online_cpus in num_other_online_cpus().
>
> I can add those when applying.
Sounds like a plan to me. Thanks!
-Doug
More information about the linux-arm-kernel
mailing list