[PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

Mon Feb 24 03:39:08 EST 2014

Srivatsa S. Bhat wrote:
> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>> The existing code sets the per CPU policy to a non-NULL value before all
>> the steps performed during the hotplug online path is done.
>> Specifically,
>> this is done before the policy min/max, governors, etc are initialized
>> for
>> the policy.  This in turn means that calls to cpufreq_cpu_get() return a
>> non-NULL policy before the policy/CPU is ready to be used.
>>
>> To fix this, move the update of per CPU policy to a valid value after
>> all
>> the initialization steps for the policy are completed.
>>
>> Example kernel panic without this fix:
>> [  512.146185] Unable to handle kernel NULL pointer dereference at
>> virtual address 00000020
>> [  512.146195] pgd = c0003000
>> [  512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>> [  512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>> <snip>
>> [  512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>> [  512.146312] LR is at cpufreq_update_policy+0x114/0x150
>> <snip>
>> [  512.149740] ---[ end trace f23a8defea6cd706 ]---
>> [  512.149761] Kernel panic - not syncing: Fatal exception
>> [  513.152016] CPU0: stopping
>> [  513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G      D W
>> 3.10.0-gd727407-00074-g979ede8 #396
>> <snip>
>> [  513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>> [  513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>> [  513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>> [  513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from
>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>> [  513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from
>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>> [  513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from
>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>> [  513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from
>> [<c0afe180>] (notifier_call_chain+0x40/0x68)
>> [  513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>> [<c02812dc>] (__cpu_notify+0x28/0x44)
>> [  513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>]
>> (_cpu_up+0xf4/0x1dc)
>> [  513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>]
>> (cpu_up+0x5c/0x78)
>> [  513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>]
>> (store_online+0x44/0x74)
>> [  513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>]
>> (sysfs_write_file+0x108/0x14c)
>> [  513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from
>> [<c03517d4>] (vfs_write+0xd0/0x180)
>> [  513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>]
>> (SyS_write+0x38/0x68)
>> [  513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>]
>> (ret_fast_syscall+0x0/0x30)
>>
>> In this specific case, CPU0 set's CPU1's policy->governor in
>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>> in
>> __cpufreq_governor().
>>
>
> Whoa! That's horrible! Can you please explain how exactly you
> triggered this? I'm curious...

Just call cpufreq_update_policy(cpu) on a CPU and have another thread
online/offline it.

>
>> Signed-off-by: Saravana Kannan <skannan at codeaurora.org>
>> ---
>>  drivers/cpufreq/cpufreq.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index cb003a6..d5ceb43 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev,
>> struct subsys_interface *sif,
>>  		goto err_set_policy_cpu;
>>  	}
>>
>> -	write_lock_irqsave(&cpufreq_driver_lock, flags);
>> -	for_each_cpu(j, policy->cpus)
>> -		per_cpu(cpufreq_cpu_data, j) = policy;
>> -	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>> -
>>  	if (cpufreq_driver->get) {
>>  		policy->cur = cpufreq_driver->get(policy->cpu);
>
> If you move the per-cpu init further down, then what happens to the
> cpufreq_generic_get() that gets invoked here by some of the drivers?

While cpufreq_generic_get() was a good refactor, I think it's causing
unnecessary cyclic dependency. You need that function to not fail for a
policy to get added properly and you need a proper policy for the function
to work. I care more about fixing the panic than trying to keep
cpufreq_generic_get().

> It will almost always fail (because policy will be NULL) and hence
> CPU online will be unsuccessful (though you wont observe it because
> the error code is not bubbled up to the CPU hotplug core; perhaps we
> should).

Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq
code base. Since this new function isn't there at that point, I missed it.
Even if I did use the latest kernel, I wouldn't have hit this issue,
because the MSM cpufreq driver doesn't use this function.

>
> IMHO, we should fix the synchronization in cpufreq_init_policy().
> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
> your stack trace, which means cpufreq_set_policy() was involved.
> cpufreq_update_policy() takes the policy->rwsem for write, whereas
> cpufreq_init_policy() doesn't take that lock at all, which is clearly
> wrong.
>
> My guess is that, if we fix that locking, everything will be fine and
> you won't hit the bug. Would you like to give that a shot?

While locking might need fixing, this is not about the locking though.
Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy
that they consider valid and also use it as a way to check and reject API
calls trying to manipulate an offline CPU.

Without this fix, the framework is advertising an incomplete policy object
as available. I think that breaks the CPUfreq framework APIs in a very
fundamental sense. This is a "no-no" in programming.

It's like trying to register a CPUfreq driver before getting the clocks to
control the CPU.

As for the ->get() issue, I think the framework should just call
clk_get_rate() instead of calling .get() if policy->clk is not NULL. No
point in doing framework -> driver -> framework.

Thanks,
Saravana

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation