SMP boot issue during system resume

Pavan Kondeti pavan.kondeti at oss.qualcomm.com
Mon Jan 5 05:19:18 PST 2026


On Mon, Jan 05, 2026 at 01:03:20PM +0000, Will Deacon wrote:
> On Mon, Jan 05, 2026 at 04:24:44PM +0530, Pavan Kondeti wrote:
> > On Fri, Jan 02, 2026 at 03:17:25PM +0000, Will Deacon wrote:
> > > On Mon, Dec 22, 2025 at 11:30:19AM +0530, Pavan Kondeti wrote:
> > > > We are seeing a SMP boot issue during system resume when CPUs are brought 
> > > > online via pm_sleep_enable_secondary_cpus()->thaw_secondary_cpus()->_cpu_up() 
> > > > on ARM64.
> > > > 
> > > > The _cpu_up() sets a global variable
> > > > 
> > > > secondary_data.task = idle;
> > > > 
> > > > and wait for the secondary CPU to come online. A 5 second timeout is
> > > > used here. If at all, the secondary CPU comes online after this timeout,
> > > > we expect it to loop in kernel via __secondary_too_slow(). However, this
> > > > depends on secondary_data.task value. Since we are bringing all disabled
> > > > cores, after timeout, we set this global variable to the next CPU idle
> > > > task and the late secondary CPU thinks the value is its idle task and
> > > > does not enter __secondary_too_slow().
> > > > 
> > > > An earlier attempt [1] to fix similar issue incrased the timout to 5
> > > > seconds. We could reproduce this issue in Linux guest where vCPU
> > > > scheduling latency can be higher under heavy load on the host.
> > > > 
> > > > I would like to seek your inputs on how we can improve the current
> > > > situation. We would like to avoid __secondary_too_slow() spin even when
> > > > the CPU comes late. This is probably not a desired behavior for other cases like 
> > > > Linux running bare metal or some guests. Having a Kconfig option or
> > > > kernel param might help here.
> > > 
> > > You probably want to use the parallel hotplug machinery (or one of the
> > > interim steps) for this, as it avoids the global state entirely. I spoke
> > > about it at KVM forum [1] and I have some old hacks at [2]. I can dust
> > > those off and post them to the list if you like?
> > 
> > Thanks Will for pointing to your informative talk. I see that your patch
> > depends on PSCIv0.2 extension to CPU_ON (context argument) [1]. I am not
> > sure if this suit our immediate needs, but it is good to know that we
> > have a plan for parallel vCPU hotplug.
> > 
> > I am happy to test if you have any other patches that address /
> > workaround this problem w/o depending on backend/firmware.
> 
> Surely you're not using PSCI v0.1 in 2026?
> 

I am not sure, what I was confused with. We are good wrt PSCI v0.2.

Thanks,
Pavan



More information about the linux-arm-kernel mailing list