[PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs

Will Deacon will at kernel.org
Thu Jun 18 05:21:08 PDT 2026


Hi Jinjie,

On Mon, Jun 15, 2026 at 04:51:48PM +0800, Jinjie Ruan wrote:
> On 6/12/2026 11:45 PM, Michael Kelley wrote:
> > From: Jinjie Ruan <ruanjinjie at huawei.com> Sent: Thursday, June 11, 2026 6:38 AM
> >>
> >> Support for parallel secondary CPU bringup is already utilized by x86,
> >> MIPS, and RISC-V. This patch brings this capability to the arm64
> >> architecture.
> >>
> >> Rework the global `secondary_data` accessed during early boot into
> >> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
> >> enabling the early boot code in head.S to resolve each secondary CPU's
> >> logical ID concurrently.
> >>
> >> To fully enable HOTPLUG_PARALLEL, this patch implements:
> >> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
> >> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
> >>
> >> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
> >>
> >> |     test kernel	   | secondary CPUs boot time |
> >> |  ---------------------   |	--------------------  |
> >> |   Without this patch     |		155.672	      |
> >> |   cpuhp.parallel=0	   |		62.897	      |
> >> |   cpuhp.parallel=1	   |		166.703	      |
> > 
> > The last two rows seem mixed up. I would expect parallel=0 to
> > result in a longer boot time.
> 
> Hi, Michael,
> 
> The results are correct and not mixed up.
> 
> Compared to the original non‑HOTPLUG_PARALLEL approach, the advantage of
> cpuhp.parallel=0 lies in its use of cpu_relax(`yield` on arm64) instead
> of the wait_for_completion_timeout() mechanism (which may cause sleep
> and context switching). This significantly reduces the overhead of VM
> exits and context switches in a KVM guest, thereby cutting the secondary
> CPU boot time by more than half.

I don't think that's a particularly compelling reason to enable this for
arm64, in all honesty. The yield instruction typically doesn't do
anything on actual arm64 silicon, so this probably means that you're
introducing busy-loops which tend to be bad for power and scalability.

I implemented this a while ago [1] but didn't manage to see much in terms
of performance improvement and so I didn't bother to send the patches out
after talking about it at KVM forum [2]. However, as mentioned at the end
of that talk, it _is_ still useful for confidential VMs using PSCI so
let me dust off my old series and send it out to see what you think.

It relies on PSCI v0.2, which means we don't need the NR_CPUS size array
for secondary_data and I also have some support for error handling (it
doesn't look like you handle __early_cpu_boot_status properly).

It looks like I could include your first patch, though!

Will

[1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=cpu-hotplug
[2] https://www.youtube.com/watch?v=Q6kOshnnQuE



More information about the linux-riscv mailing list