[PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
Jinjie Ruan
ruanjinjie at huawei.com
Mon Jun 15 02:57:22 PDT 2026
On 6/12/2026 11:45 PM, Michael Kelley wrote:
> From: Jinjie Ruan <ruanjinjie at huawei.com> Sent: Thursday, June 11, 2026 6:38 AM
>>
>> Support for parallel secondary CPU bringup is already utilized by x86,
>> MIPS, and RISC-V. This patch brings this capability to the arm64
>> architecture.
>>
>> Rework the global `secondary_data` accessed during early boot into
>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
>> enabling the early boot code in head.S to resolve each secondary CPU's
>> logical ID concurrently.
>>
>> To fully enable HOTPLUG_PARALLEL, this patch implements:
>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
>>
>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
>>
>> | test kernel | secondary CPUs boot time |
>> | --------------------- | -------------------- |
>> | Without this patch | 155.672 |
>> | cpuhp.parallel=0 | 62.897 |
>> | cpuhp.parallel=1 | 166.703 |
>
> The last two rows seem mixed up. I would expect parallel=0 to
> result in a longer boot time.
Without this patch:
KVM event statistics (6 entries)
Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 323112 75.00% 1669148000 17.00%
5165
WFx 85817 19.00% 723215800 7.00%
8427
SYS64 14914 3.00% 419934530 4.00%
28157
IRQ 5643 1.00% 6732439250 70.00%
1193060
HVC64 282 0.00% 35543970 0.00%
126042
IABT_LOW 1 0.00% 6130 0.00%
6130
cpuhp.parallel=0:
Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 308175 80.00% 643628050 6.00%
2088
WFx 55208 14.00% 261925270 2.00%
4744
SYS64 14975 3.00% 155727880 1.00%
10399
IRQ 4755 1.00% 8496162210 88.00%
1786784
HVC64 280 0.00% 19429900 0.00%
69392
IABT_LOW 1 0.00% 5850 0.00%
5850
cpuhp.parallel=1:
Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 307923 77.00% 692965050 2.00%
2250
WFx 59549 15.00% 287888960 0.00%
4834
SYS64 15127 3.00% 334366230 1.00%
22103
IRQ 12861 3.00% 29784004970 95.00%
2315838
HVC64 280 0.00% 21869940 0.00%
78106
IABT_LOW 1 0.00% 9320 0.00%
9320
- Default (no patch): Slowest HVC64 handling (126 μs), highest WFx count
(85k), and most total VM‑exits.
- cpuhp.parallel=1: HVC64 latency improved to 78 μs (close to
cpuhp.parallel=0), but IRQ exits increased dramatically (12.9k, 2.7×
that of `cpuhp.parallel=0`), accounting for 95% of event time and
becoming the new bottleneck.
- cpuhp.parallel=0: Fastest HVC64 (69 μs), lowest IRQ exits (4.8k), and
lowest total samples, delivering the best overall boot performance.
Therefor, `cpuhp.parallel=1` reduces HVC cost but suffers from a massive
increase in IRQ exits, while `cpuhp.parallel=0` avoids this interrupt
storm and therefore performs best in a KVM guest.
>
> Michael
>
>>
>> Signed-off-by: Jinjie Ruan <ruanjinjie at huawei.com>
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/smp.h | 8 ++++++++
>> arch/arm64/kernel/head.S | 23 +++++++++++++++++++++++
>> arch/arm64/kernel/smp.c | 27 +++++++++++++++++++++++++++
>> 4 files changed, 59 insertions(+)
>>
>
>
More information about the linux-arm-kernel
mailing list