[RFC PATCH] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type
Peter Zijlstra
peterz at infradead.org
Mon Jun 30 01:48:08 PDT 2025
On Mon, Jun 30, 2025 at 05:20:59PM +0900, Shashank Balaji wrote:
> On an x86 machine, when cpu 0 is isolated with "isolcpus=", on initiating
> suspend to memory, a warning is triggered, followed by a kernel crash. This is
> on a defconfig + CONFIG_ENERGY_MODEL kernel:
> This happens because in order to offline the last secondary cpu, i.e. cpu 1,
> build_sched_domains() ends up being passed an empty cpumask, since the only remaining
> cpu (cpu 0) is isolated. It warns and fails, after which perf domains are
> are attempted to be built, which crashes the kernel. The same problem occurs
> during cpu hotplug, but that was fixed by
> commit 38685e2a0476127d ("cpu/hotplug: Don't offline the last non-isolated CPU").
>
> Fix this by ensuring that the primary cpu, the last standing cpu, is of domain
> type, so that build_sched_domains() is not passed an empty cpumask.
>
> Co-developed-by: Rahul Bukte <rahul.bukte at sony.com>
> Signed-off-by: Rahul Bukte <rahul.bukte at sony.com>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam at sony.com>
> ---
> kernel/cpu.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a59e009e0be4..d9167b0559a5 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1902,12 +1902,28 @@ int freeze_secondary_cpus(int primary)
>
> cpu_maps_update_begin();
> if (primary == -1) {
> - primary = cpumask_first(cpu_online_mask);
> - if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
> - primary = housekeeping_any_cpu(HK_TYPE_TIMER);
> + primary = cpumask_first_and_and(cpu_online_mask,
> + housekeeping_cpumask(HK_TYPE_TIMER),
> + housekeeping_cpumask(HK_TYPE_DOMAIN));
That's terrible indenting, please align after the opening bracket like:
primary = cpumask_first_and_and(cpu_online_mask,
housekeeping_cpumask(HK_TYPE_TIMER),
housekeeping_cpumask(HK_TYPE_DOMAIN));
Also, IIRC HK_TYPE_HRTIMER is deprecated and should be something like
HK_TYPE_NOISE or somesuch. Frederic?
> + if (primary >= nr_cpu_ids) {
> + error = -ENODEV;
> + pr_err("No suitable primary CPU found. Ensure at least one non-isolated, non-nohz_full CPU is online\n");
> + goto abort;
> + }
> } else {
> - if (!cpu_online(primary))
> - primary = cpumask_first(cpu_online_mask);
> + if (!cpu_online(primary)) {
> + primary = cpumask_first_and(cpu_online_mask,
> + housekeeping_cpumask(HK_TYPE_DOMAIN));
Indenting again.
> + if (primary >= nr_cpu_ids) {
> + error = -ENODEV;
> + pr_err("No suitable primary CPU found. Ensure at least one non-isolated CPU is online\n");
> + goto abort;
> + }
> + } else if (!housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
> + error = -ENODEV;
> + pr_err("Primary CPU %d should not be isolated\n", primary);
> + goto abort;
> + }
> }
>
> /*
> @@ -1943,6 +1959,7 @@ int freeze_secondary_cpus(int primary)
> else
> pr_err("Non-boot CPUs are not disabled\n");
>
> +abort:
> /*
> * Make sure the CPUs won't be enabled by someone else. We need to do
> * this even in case of failure as all freeze_secondary_cpus() users are
Also; doesn't the above boil down to something like:
if (primary == -1) {
primary = cpumask_first_and_and(cpu_online_mask,
housekeeping_cpumask(HK_TYPE_TIMER),
housekeeping_cpumask(HK_TYPE_DOMAIN));
} if (!cpu_online(primary)) {
primary = cpumask_first_and(cpu_online_mask,
housekeeping_cpumask(HK_TYPE_DOMAIN));
}
if (primary >= nr_cpu_ids || !housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
error = -ENODEV;
pr_err("Primary CPU %d should not be isolated\n", primary);
goto abort;
}
Yes, this has less error string variation, but the code is simpler.
More information about the linux-riscv
mailing list