[PATCH] ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag

Daniel Lezcano daniel.lezcano at linaro.org
Wed Jul 17 17:45:47 EDT 2013


On 07/17/2013 10:31 PM, Stephen Warren wrote:
> On 07/17/2013 04:15 AM, Joseph Lo wrote:
>> On Wed, 2013-07-17 at 03:51 +0800, Stephen Warren wrote:
>>> On 07/16/2013 05:17 AM, Joseph Lo wrote:
>>>> On Tue, 2013-07-16 at 02:04 +0800, Stephen Warren wrote:
>>>>> On 06/25/2013 03:23 AM, Joseph Lo wrote:
>>>>>> Use the CPUIDLE_FLAG_TIMER_STOP and let the cpuidle framework
>>>>>> to handle the CLOCK_EVT_NOTIFY_BROADCAST_ENTER/EXIT when entering
>>>>>> this state.
> ... [ discussion of issues with Joesph's patches applied]
>>
>> OK. I did more stress tests last night and today. I found it cause by
>> the patch "ARM: tegra: cpuidle: use CPUIDLE_FLAG_TIMER_STOP flag" and
>> only impact the Tegra20 platform. The hot plug regression seems due to
>> this patch. After dropping this patch on top of v3.11-rc1, the Tegra20
>> can back to normal.
>>
>> And the hop plug and suspend stress test can pass on Tegra30/114 too.
>>
>> Can the other two patch series for Tegra114 to support CPU idle power
>> down mode and system suspend still moving forward, not be blocked by
>> this patch?
>>
>> Looks the CPUIDLE_FLAG_TIMER_STOP flag still cause some other issue for
>> hot plug on Tegra20, I will continue to check this. You can just drop
>> this patch.
> 
> OK, if I drop that patch, then everything on Tegra20 and Tegra30 seems
> fine again.
> 
> However, I've found some new and exciting issue on Tegra114!
> 
> With unmodified v3.11-rc1, I can do the following without issue:
> 
> * Unplug/replug CPUs, so that I had all combinations of CPU 1, 2, 3
> plugged/unpplugged (with CPU 0 always plugged).
> 
> * Unplug/replug CPUs, so that I had all combinations of CPU 0, 1, 2, 3
> plugged/unpplugged (with the obvious exception of never having all CPUs
> unplugged).
> 
> However, if I try this with your Tegra114 cpuidle and suspend patches
> applied, I see the following issues:
> 
> 1) If I boot, unplug CPU 0, then replug CPU 0, the system immediately
> hard-hangs.
> 
> 2) If I run the hotplug test script, leaving CPU 0 always present, I
> sometimes see:
> 
>> root at localhost:~# for i in `seq 1 50`; do echo ITERATION $i; ./cpuonline.py; done
>> ITERATION 1
>> echo 0 > /sys/devices/system/cpu/cpu2/online
>> [  458.910054] CPU2: shutdown
>> echo 0 > /sys/devices/system/cpu/cpu1/online
>> [  461.004371] CPU1: shutdown
>> echo 0 > /sys/devices/system/cpu/cpu3/online
>> [  463.027341] CPU3: shutdown
>> echo 1 > /sys/devices/system/cpu/cpu1/online
>> [  465.061412] CPU1: Booted secondary processor
>> echo 1 > /sys/devices/system/cpu/cpu2/online
>> [  467.095313] CPU2: Booted secondary processor
>> [  467.113243] ------------[ cut here ]------------
>> [  467.117948] WARNING: CPU: 2 PID: 0 at kernel/time/tick-broadcast.c:667 tick_broadcast_oneshot_control+0x19c/0x1c4()
>> [  467.128352] Modules linked in:
>> [  467.131455] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.11.0-rc1-00022-g7487363-dirty #49
>> [  467.139678] [<c0015620>] (unwind_backtrace+0x0/0xf8) from [<c001154c>] (show_stack+0x10/0x14)
>> [  467.148228] [<c001154c>] (show_stack+0x10/0x14) from [<c05135a8>] (dump_stack+0x80/0xc4)
>> [  467.156336] [<c05135a8>] (dump_stack+0x80/0xc4) from [<c0024590>] (warn_slowpath_common+0x64/0x88)
>> [  467.165300] [<c0024590>] (warn_slowpath_common+0x64/0x88) from [<c00245d0>] (warn_slowpath_null+0x1c/0x24)
>> [  467.174959] [<c00245d0>] (warn_slowpath_null+0x1c/0x24) from [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4)
>> [  467.185659] [<c00695e4>] (tick_broadcast_oneshot_control+0x19c/0x1c4) from [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc)
>> [  467.196538] [<c0067cdc>] (clockevents_notify+0x1b0/0x1dc) from [<c034f348>] (cpuidle_idle_call+0x11c/0x168)
>> [  467.206292] [<c034f348>] (cpuidle_idle_call+0x11c/0x168) from [<c000f134>] (arch_cpu_idle+0x8/0x38)
>> [  467.215359] [<c000f134>] (arch_cpu_idle+0x8/0x38) from [<c0061038>] (cpu_startup_entry+0x60/0x134)
>> [  467.224325] [<c0061038>] (cpu_startup_entry+0x60/0x134) from [<800083d8>] (0x800083d8)
>> [  467.232227] ---[ end trace ea579be22a00e7fb ]---
>> echo 0 > /sys/devices/system/cpu/cpu1/online
>> [  469.126682] CPU1: shutdown
> 
> I have found no solution for (1) (although I didn't look hard!).
> 
> (2) can be solved with the following (at least 50 iterations of my test
> script worked with this patch applied):

Actually this warning is resulting from a bug in the tick broadcast code
and has been solved with commit:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=ea8deb8dfa6b0e8d1b3d1051585706739b46656c

This patch has been merged in timers/urgent branch but still need to
merged with timers/core.

The patch below does not fix the warning but prevents the tick warning
to occur. Applying the patch above will fix your problem.


>> diff --git a/arch/arm/mach-tegra/cpuidle-tegra114.c b/arch/arm/mach-tegra/cpuidle-tegra114.c
>> index 658b205..896408d 100644
>> --- a/arch/arm/mach-tegra/cpuidle-tegra114.c
>> +++ b/arch/arm/mach-tegra/cpuidle-tegra114.c
>> @@ -66,8 +66,7 @@ static struct cpuidle_driver tegra_idle_driver = {
>>                         .exit_latency           = 500,
>>                         .target_residency       = 1000,
>>                         .power_usage            = 0,
>> -                       .flags                  = CPUIDLE_FLAG_TIME_VALID |
>> -                                                 CPUIDLE_FLAG_TIMER_STOP,
>> +                       .flags                  = CPUIDLE_FLAG_TIME_VALID,
>>                         .name                   = "powered-down",
>>                         .desc                   = "CPU power gated",
>>                 },
> 
> Here's my test script for reference:
> 
> #!/usr/bin/env python
> 
> import multiprocessing
> import os
> import sys
> import time
> 
> cpus = multiprocessing.cpu_count()
> if cpus == 4:
>   socf = file('/sys/devices/soc0/soc_id')
>   soc = socf.readline().strip()
>   socf.close()
>   if True: #soc == '48':
>     gc = (11, 9, 1, 3, 7, 5, 13, 15)
>   else:
>     gc = (14, 10, 11, 9, 8, 1, 3, 2, 6, 7, 5, 4, 12, 13, 15)
> elif cpus == 2:
>   gc = (1, 3)
> else:
>   raise Exception("Invalid CPU count %d" % cpus)
> 
> oldidx = len(gc) - 1
> oldmask = gc[oldidx]
> 
> for newidx in range(len(gc)):
>   newmask = gc[newidx]
>   for cpu in range(cpus):
>     oldon = oldmask & (1 << cpu)
>     newon = newmask & (1 << cpu)
>     if oldon != newon:
>       if newon:
>         newonval = 1
>       else:
>         newonval = 0
>       cmd = "echo %d > /sys/devices/system/cpu/cpu%d/online" \
> % (newonval, cpu)
>       print cmd
>       os.system(cmd)
>   time.sleep(2)
>   oldidx = newidx
>   oldmask = newmask
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog




More information about the linux-arm-kernel mailing list