[RFC PATCH] kernel/cpu: in freeze_secondary_cpus() ensure primary cpu is of domain type

Peter Zijlstra peterz at infradead.org
Mon Jun 30 01:48:08 PDT 2025


On Mon, Jun 30, 2025 at 05:20:59PM +0900, Shashank Balaji wrote:
> On an x86 machine, when cpu 0 is isolated with "isolcpus=", on initiating
> suspend to memory, a warning is triggered, followed by a kernel crash. This is
> on a defconfig + CONFIG_ENERGY_MODEL kernel:

> This happens because in order to offline the last secondary cpu, i.e. cpu 1,
> build_sched_domains() ends up being passed an empty cpumask, since the only remaining
> cpu (cpu 0) is isolated. It warns and fails, after which perf domains are
> are attempted to be built, which crashes the kernel. The same problem occurs
> during cpu hotplug, but that was fixed by
> commit 38685e2a0476127d ("cpu/hotplug: Don't offline the last non-isolated CPU").
> 
> Fix this by ensuring that the primary cpu, the last standing cpu, is of domain
> type, so that build_sched_domains() is not passed an empty cpumask.
> 

> Co-developed-by: Rahul Bukte <rahul.bukte at sony.com>
> Signed-off-by: Rahul Bukte <rahul.bukte at sony.com>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam at sony.com>
> ---
>  kernel/cpu.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a59e009e0be4..d9167b0559a5 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1902,12 +1902,28 @@ int freeze_secondary_cpus(int primary)
>  
>  	cpu_maps_update_begin();
>  	if (primary == -1) {
> -		primary = cpumask_first(cpu_online_mask);
> -		if (!housekeeping_cpu(primary, HK_TYPE_TIMER))
> -			primary = housekeeping_any_cpu(HK_TYPE_TIMER);
> +		primary = cpumask_first_and_and(cpu_online_mask,
> +								housekeeping_cpumask(HK_TYPE_TIMER),
> +								housekeeping_cpumask(HK_TYPE_DOMAIN));

That's terrible indenting, please align after the opening bracket like:

		primary = cpumask_first_and_and(cpu_online_mask,
						housekeeping_cpumask(HK_TYPE_TIMER),
						housekeeping_cpumask(HK_TYPE_DOMAIN));

Also, IIRC HK_TYPE_HRTIMER is deprecated and should be something like
HK_TYPE_NOISE or somesuch. Frederic?

> +		if (primary >= nr_cpu_ids) {
> +			error = -ENODEV;
> +			pr_err("No suitable primary CPU found. Ensure at least one non-isolated, non-nohz_full CPU is online\n");
> +			goto abort;
> +		}
>  	} else {
> -		if (!cpu_online(primary))
> -			primary = cpumask_first(cpu_online_mask);
> +		if (!cpu_online(primary)) {
> +			primary = cpumask_first_and(cpu_online_mask,
> +								housekeeping_cpumask(HK_TYPE_DOMAIN));

Indenting again.

> +			if (primary >= nr_cpu_ids) {
> +				error = -ENODEV;
> +				pr_err("No suitable primary CPU found. Ensure at least one non-isolated CPU is online\n");
> +				goto abort;
> +			}
> +		} else if (!housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
> +			error = -ENODEV;
> +			pr_err("Primary CPU %d should not be isolated\n", primary);
> +			goto abort;
> +		}
>  	}
>  
>  	/*
> @@ -1943,6 +1959,7 @@ int freeze_secondary_cpus(int primary)
>  	else
>  		pr_err("Non-boot CPUs are not disabled\n");
>  
> +abort:
>  	/*
>  	 * Make sure the CPUs won't be enabled by someone else. We need to do
>  	 * this even in case of failure as all freeze_secondary_cpus() users are


Also; doesn't the above boil down to something like:

	if (primary == -1) {
		primary = cpumask_first_and_and(cpu_online_mask,
						housekeeping_cpumask(HK_TYPE_TIMER),
						housekeeping_cpumask(HK_TYPE_DOMAIN));
	} if (!cpu_online(primary)) {
		primary = cpumask_first_and(cpu_online_mask,
					    housekeeping_cpumask(HK_TYPE_DOMAIN));
	} 

	if (primary >= nr_cpu_ids || !housekeeping_cpu(primary, HK_TYPE_DOMAIN)) {
		error = -ENODEV;
		pr_err("Primary CPU %d should not be isolated\n", primary);
		goto abort;
	}

Yes, this has less error string variation, but the code is simpler.



More information about the linux-riscv mailing list