[PATCH v2] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
Will Deacon
will at kernel.org
Mon Aug 5 09:53:37 PDT 2024
Hi Doug,
On Tue, Jun 25, 2024 at 04:07:22PM -0700, Douglas Anderson wrote:
> @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void)
>
> void smp_send_stop(void)
> {
> + static unsigned long stop_in_progress;
> + cpumask_t mask;
> unsigned long timeout;
>
> - if (num_other_online_cpus()) {
> - cpumask_t mask;
> + /*
> + * If this cpu is the only one alive at this point in time, online or
> + * not, there are no stop messages to be sent around, so just back out.
> + */
> + if (num_other_online_cpus() == 0)
> + goto skip_ipi;
>
> - cpumask_copy(&mask, cpu_online_mask);
> - cpumask_clear_cpu(smp_processor_id(), &mask);
> + /* Only proceed if this is the first CPU to reach this code */
> + if (test_and_set_bit(0, &stop_in_progress))
> + return;
>
> - if (system_state <= SYSTEM_RUNNING)
> - pr_crit("SMP: stopping secondary CPUs\n");
> - smp_cross_call(&mask, IPI_CPU_STOP);
> - }
> + cpumask_copy(&mask, cpu_online_mask);
> + cpumask_clear_cpu(smp_processor_id(), &mask);
>
> - /* Wait up to one second for other CPUs to stop */
> + if (system_state <= SYSTEM_RUNNING)
> + pr_crit("SMP: stopping secondary CPUs\n");
> +
> + /*
> + * Start with a normal IPI and wait up to one second for other CPUs to
> + * stop. We do this first because it gives other processors a chance
> + * to exit critical sections / drop locks and makes the rest of the
> + * stop process (especially console flush) more robust.
> + */
> + smp_cross_call(&mask, IPI_CPU_STOP);
I realise you've moved this out of crash_smp_send_stop() and it looks
like we inherited the code from x86, but do you know how this serialise
against CPU hotplug operations? I've spent the last 20m looking at the
code and I can't see what prevents other CPUs from coming and going
while we're trying to IPI a non-atomic copy of 'cpu_online_mask'.
Will
More information about the linux-arm-kernel
mailing list