[RFC] Fixing CPU Hotplug for RealView Platforms

Russell King - ARM Linux linux at arm.linux.org.uk
Sat Dec 18 14:22:13 EST 2010


On Sat, Dec 18, 2010 at 05:44:47PM +0000, Will Deacon wrote:
> > Hotplug bringup:
> > 
> > Booting: 1000                   -> 0ns          0ns             (1us per print)
> > Restarting: 3976375             ->              3.976375ms
> > cross call: 3976625             -> 3.976625ms
> > Up: 4003125                     ->              4.003125ms
> > CPU1: Booted secondary processor
> > secondary_init: 4022583         ->              4.022583ms
> > writing release: 4040750        ->              4.04075ms
> > release done: 4051083           ->              4.051083ms
> > released: 46509000              -> 4.6509ms
> > Boot returned: 51745708         -> 5.1745708ms
> > sync'd: 51745875                ->              5.1745875ms
> > CPU1: Unknown IPI message 0x1
> > Switched to NOHz mode on CPU #1
> > Online: 281251041               ->              281.251041ms
> > 
> > So, it appears to take 4ms to get from just before the call to
> > boot_secondary() in __cpu_up() to writing pen_release.
> > 
> > The secondary CPU appears to run from being woken up to writing the
> > pen release in about 40us - and then spends about 1ms spinning on
> > its lock waiting for the requesting CPU to catch up.
> > 
> > This can be repeated every time without exception when you bring a
> > CPU back online.
> > 
> Hmm, this sounds needlessly expensive.

Actually, I'm starting to get concerned about doing timing measurements
on Versatile Express - I'm seeing some unexplainable issues with the
Versatile Express platform.

I occasionally see the kernel get stuck when initializing the CLCD - and
I think this is a hardware lockup - pressing the red 'reset/power on'
button is ignored, and the only way to recover it is to press the
black 'power off' button first.

Also I keep running into some weird stuff which causes the MMC to
underflow, serial output to be corrupted, and rootfs not to be mounted
which is 100% reliable with some kernels (iow, the built kernel just
will not boot no matter how many times you attempt to do so.)  I've
sent Catalin & Philippe a copy of one such kernel which exhibits this
behaviour a few days ago (but I think they're on holiday.)

Anyway, I decided to implement a slightly different method to measuring
the time taken, and the apparant long delays have gone - I suspect that
was something to do with printk.  I'm not logging the times into an
array, and later printing out the values.

So, CPU1 boot:

SMP: Start: 0
SMP: Booting: 916
SMP: Cross call: 3083
SMP: Pen released: 278416
SMP: Unlock: 279583
SMP: Boot returned: 280333

SMP: Sec: up: 238666
SMP: Sec: enter: 264333
SMP: Sec: pen write: 267083
SMP: Sec: pen done: 268916
SMP: Sec: exit: 279916
SMP: Sec: calibrate: 328416
SMP: Sec: online: 218380875

CPU1 hotplug:
SMP: Start: 0
SMP: Booting: 833
SMP: Cross call: 4250
SMP: Pen released: 51500
SMP: Unlock: 52667
SMP: Boot returned: 53500

SMP: Sec: restart: 4667
SMP: Sec: up: 7167
SMP: Sec: enter: 31000
SMP: Sec: pen write: 39667
SMP: Sec: pen done: 42167
SMP: Sec: exit: 53000
SMP: Sec: calibrate: 104583
SMP: Sec: online: 221423333

This looks far saner.

Anyway, with the delay loop calibration, we're looking at a boot time of
about 110us to the delay loop calibration, and 221ms for a secondary CPU
using the existing code.  I don't think that will go up significantly if
we re-vector offlined CPUs back through the reset vector.



More information about the linux-arm-kernel mailing list