[PATCH v3] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable()
Catalin Marinas
catalin.marinas at arm.com
Tue Jun 9 10:58:22 PDT 2026
Hi Jinjie,
On Wed, Jun 03, 2026 at 02:38:11PM +0800, Jinjie Ruan wrote:
> On 6/2/2026 7:09 PM, Will Deacon wrote:
> > On Wed, May 20, 2026 at 10:20:23AM +0800, Jinjie Ruan wrote:
> >> When booting with ACPI, arm64 smp_prepare_cpus() currently sets all
> >> enumerated CPUs as "present" regardless of their status in the MADT. This
> >> causes issues with SMT hotplug control. For instance, with QEMU's
> >> "-smp 4,maxcpus=8" configuration, the MADT GICC entries are populated as
> >> follows: the first four CPUs are marked Enabled while the remaining four
> >> are marked Online Capable to support potential hot-plugging.
> >>
> >> Fix this by:
> >>
> >> 1. When booting with ACPI, checking the ACPI_MADT_ENABLED flag in the GICC
> >> entry before calling set_cpu_present() during SMP initialization.
> >>
> >> 2. Properly managing the present mask in acpi_map_cpu() and
> >> acpi_unmap_cpu() to support actual CPU hotplug events, This aligns with
> >> other architectures like x86 and LoongArch.
> >>
> >> 3. Update the arm64 CPU hotplug documentation to no longer state that all
> >> online-capable vCPUs are marked as present by the kernel at boot time.
> >>
> >> This ensures that only physically available or explicitly enabled CPUs
> >> are in the present mask, keeping the SMT control logic consistent with
> >> the actual hardware state.
> >
> > Please can you check the Sashiko review comment?
> >
> > https://sashiko.dev/#/patchset/20260520022023.126670-1-ruanjinjie@huawei.com
>
> I think commit eba4675008a6 ("arm64: arch_register_cpu() variant to
> check if an ACPI handle is now available.") introduced this bug.
>
> It introduced an architectural safety block inside
> arch_unregister_cpu(). If a hot-unplug operation is determined to be a
> physical hardware removal (where _STA evaluates to
> !ACPI_STA_DEVICE_PRESENT), it aborts the unregistration transaction
> early to protect unreadied arm64 infrastructure, thereby skipping
> unregister_cpu().
>
> However, the generic ACPI processor driver path in
> acpi_processor_post_eject() currently treats arch_unregister_cpu() as
> an unconditional void operation. When arch_unregister_cpu() bails out
> early, the subsequent cleanup flow blindly proceeds to call
> acpi_unmap_cpu(), clears global per-cpu processor arrays, and
> unconditionally free the 'struct acpi_processor' object.
>
> I think we can fix this by:
>
> 1. Refactoring arch_unregister_cpu() to return an integer
> transaction status. It returns -EOPNOTSUPP when aborting due to physical
> hot-remove blocking, -EINVAL/-EIO on firmware failures, and 0 only upon
> successful unregistration.
>
> 2. Guarding the downstream execution flow in
> acpi_processor_post_eject(). If arch_unregister_cpu() returns a error
> code, the hot-unplug transaction is considered aborted.
I wonder whether we need all this guarding. In the worst case, we could
rewrite the function, something like below, to always unregister and
only warn:
void arch_unregister_cpu(int cpu)
{
acpi_handle acpi_handle = acpi_get_processor_handle(cpu);
struct cpu *c = &per_cpu(cpu_devices, cpu);
acpi_status status;
unsigned long long sta;
if (!acpi_handle) {
pr_err_once("Removing a CPU without associated ACPI handle\n");
} else {
status = acpi_evaluate_integer(acpi_handle, "_STA", NULL, &sta);
if (!ACPI_FAILURE(status) &&
cpu_present(cpu) && !(sta & ACPI_STA_DEVICE_PRESENT))
pr_err_once("Changing CPU present bit is not supported\n");
}
unregister_cpu(c);
}
However, on the first condition, can we actually trigger !acpi_handle?
If not, we could just drop it. I tried to look up the paths and I don't
think we'd ever end up in this function with !acpi_handle. So this
leaves us with the next checks.
On the second/third conditions, it's more about preventing physical CPU
hotplug as we haven't properly defined it for arm yet but we could just
add a WARN_ONCE() to make it more visible and still proceed with the
unregistering. I think with your proposal, we don't fully unroll the
state anyway just by returning an error in arch_unregister_cpu(), so I'd
rather continue here.
What does firmware do for virtual CPU hotplug w.r.t. _STA? I noticed a
slight change in wording in the cpu-hotplug.rst doc with your patch from
On virtual systems the _STA method must always report the CPU as
``present``
to
On virtual systems the _STA method must report the CPU as ``present``
when it is activated by the firmware
Was your intention that _STA.PRESENT can become 0 when hot-unplugging
virtual CPUs?
--
Catalin
More information about the linux-arm-kernel
mailing list