[PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs

Jinjie Ruan ruanjinjie at huawei.com
Mon Jun 15 02:57:22 PDT 2026



On 6/12/2026 11:45 PM, Michael Kelley wrote:
> From: Jinjie Ruan <ruanjinjie at huawei.com> Sent: Thursday, June 11, 2026 6:38 AM
>>
>> Support for parallel secondary CPU bringup is already utilized by x86,
>> MIPS, and RISC-V. This patch brings this capability to the arm64
>> architecture.
>>
>> Rework the global `secondary_data` accessed during early boot into
>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
>> enabling the early boot code in head.S to resolve each secondary CPU's
>> logical ID concurrently.
>>
>> To fully enable HOTPLUG_PARALLEL, this patch implements:
>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
>>
>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
>>
>> |     test kernel	   | secondary CPUs boot time |
>> |  ---------------------   |	--------------------  |
>> |   Without this patch     |		155.672	      |
>> |   cpuhp.parallel=0	   |		62.897	      |
>> |   cpuhp.parallel=1	   |		166.703	      |
> 
> The last two rows seem mixed up. I would expect parallel=0 to
> result in a longer boot time.

Without this patch:

KVM event statistics (6 entries)
Event name       Samples       Sample%     Time (ns)         Time%
Mean Time (ns)
  DABT_LOW        323112        75.00%    1669148000        17.00%
5165
       WFx         85817        19.00%     723215800         7.00%
8427
     SYS64         14914         3.00%     419934530         4.00%
28157
       IRQ          5643         1.00%    6732439250        70.00%
1193060
     HVC64           282         0.00%      35543970         0.00%
126042
  IABT_LOW             1         0.00%          6130         0.00%
6130

cpuhp.parallel=0:

Event name       Samples       Sample%     Time (ns)         Time%
Mean Time (ns)
 DABT_LOW        308175        80.00%     643628050         6.00%
2088
      WFx         55208        14.00%     261925270         2.00%
4744
    SYS64         14975         3.00%     155727880         1.00%
10399
      IRQ          4755         1.00%    8496162210        88.00%
1786784
    HVC64           280         0.00%      19429900         0.00%
69392
 IABT_LOW             1         0.00%          5850         0.00%
5850

cpuhp.parallel=1:

 Event name       Samples       Sample%     Time (ns)         Time%
Mean Time (ns)
 DABT_LOW        307923        77.00%     692965050         2.00%
     2250
      WFx         59549        15.00%     287888960         0.00%
     4834
    SYS64         15127         3.00%     334366230         1.00%
    22103
      IRQ         12861         3.00%   29784004970        95.00%
  2315838
    HVC64           280         0.00%      21869940         0.00%
    78106
 IABT_LOW             1         0.00%          9320         0.00%
     9320

- Default (no patch): Slowest HVC64 handling (126 μs), highest WFx count
(85k), and most total VM‑exits.

- cpuhp.parallel=1: HVC64 latency improved to 78 μs (close to
cpuhp.parallel=0), but IRQ exits increased dramatically (12.9k, 2.7×
that of `cpuhp.parallel=0`), accounting for 95% of event time and
becoming the new bottleneck.

- cpuhp.parallel=0: Fastest HVC64 (69 μs), lowest IRQ exits (4.8k), and
lowest total samples, delivering the best overall boot performance.

Therefor, `cpuhp.parallel=1` reduces HVC cost but suffers from a massive
increase in IRQ exits, while `cpuhp.parallel=0` avoids this interrupt
storm and therefore performs best in a KVM guest.

> 
> Michael
> 
>>
>> Signed-off-by: Jinjie Ruan <ruanjinjie at huawei.com>
>> ---
>>  arch/arm64/Kconfig           |  1 +
>>  arch/arm64/include/asm/smp.h |  8 ++++++++
>>  arch/arm64/kernel/head.S     | 23 +++++++++++++++++++++++
>>  arch/arm64/kernel/smp.c      | 27 +++++++++++++++++++++++++++
>>  4 files changed, 59 insertions(+)
>>
> 
> 




More information about the linux-arm-kernel mailing list