3.18: lockdep problems in cpufreq

Mon Dec 15 06:54:05 PST 2014

> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Monday, December 15, 2014 7:17 PM
> To: Yadwinder Singh Brar
> Cc: 'Viresh Kumar'; 'Rafael J. Wysocki'; linux-arm-
> kernel at lists.infradead.org; linux-pm at vger.kernel.org; 'Eduardo
> Valentin'
> Subject: Re: 3.18: lockdep problems in cpufreq
> 
> On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> > Unfortunately, I didn’t get any such warning though I tested patch
> > enabling CONFIG_PROVE_LOCKING before posting. It seems Russell is
> > trying to load imx_thermal as module and parallely Changing cpufreq
> > governor from userspace, which was not my test case.
> 
> No.  Yes, imx_thermal is a module, which is loaded by udev very early
> in the userspace boot.  Later on, in the rc.local, I poke the governor
> from userspace.
> 
> This is evidenced by the bluetooth modules being loaded after
> imx_thermal:
> 
> Module                  Size  Used by
> bnep                   10574  2
> rfcomm                 33720  0
> bluetooth             293598  10 bnep,rfcomm
> nfsd                   90264  0
> exportfs                3988  1 nfsd
> hid_cypress             1763  0
> snd_soc_fsl_spdif       9753  2
> imx_pcm_dma             1137  1 snd_soc_fsl_spdif
> imx_sdma               12885  2
> imx2_wdt                3248  0
> imx_thermal             5386  0
> snd_soc_imx_spdif       1877  0
> 
> and the timestamp on the bluetooth message:
> 
> [   25.503739] Bluetooth: Core ver 2.19
> 
> vs the timestamp on the lockdep message:
> 
> [   29.499195] [ INFO: possible circular locking dependency detected ]
> 
> My rc.local does this:
> 
>     # Configure cpufreq
>     echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
>     echo performance >
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> 
> > Anyways, after analyzing log and code,I think problem is not in
> > cpufreq_thermal_notifier which was modified in patch as stated above.
> > Actual problem is in __cpufreq_cooling_register which is
> unnecessarily
> > calling cpufreq_register_notifier() inside section protected by
> > cooling_cpufreq_lock.
> > Because cpufreq_policy_notifier_list).rwsem is already held by
> > store_scaling_governor when __cpufreq_cooling_register is trying to
> > cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.
> >
> > So I think following can fix the problem:
> >
> > diff --git a/drivers/thermal/cpu_cooling.c
> > b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node
> *np,
> >         cpufreq_dev->cpufreq_state = 0;
> >         mutex_lock(&cooling_cpufreq_lock);
> >
> > -       /* Register the notifier for first cpufreq cooling device */
> > -       if (cpufreq_dev_count == 0)
> > -
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > -                                         CPUFREQ_POLICY_NOTIFIER);
> >         cpufreq_dev_count++;
> >         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> >
> >         mutex_unlock(&cooling_cpufreq_lock);
> >
> > +       /* Register the notifier for first cpufreq cooling device */
> > +       if (cpufreq_dev_count == 0)
> > +
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > +                                         CPUFREQ_POLICY_NOTIFIER);
> 
> No, sorry, this patch is worse.
> 

Actually it was not patch :P , just moved cpufreq_register_notifier()
out of locking, as it is(C&P) to explain my point.

> 1. cpufreq_register_notifier() will never be called, even on the first
>    caller to this code: at the point where it's tested for zero, it
> will
>    be incremented to one by the previous code.
> 
> 2. What happens if two threads come here?
> 
>    The first thread succeeds in grabbing cooling_cpufreq_lock.  The
> second
>    thread stalls waiting for cooling_cpufreq_lock to be released.
> 
>    The first thread increments cpufreq_dev_count, adds to the list, and
> then
>    releases the lock.  At this point, control may be passed to the
> second
>    thread (since mutex_unlock() will wake it up.)  The second thread
> gets
>    into the critical region, and increments cpufreq_dev_count again.
> 
>    By the time the first thread runs, cpufreq_dev_count may be two at
> this
>    point.

Yes, may be.

> 
> Unfortunately, you do need some kind of synchronisation here.  If it's
> not important when cpufreq_register_notifier() gets called, maybe this
> can work:
> 
> 	bool register;
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	register = cpufreq_dev_count++ == 0;
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> 	if (register)

register may be 0 in scenario you stated above in second point.
So this approach will not work.

> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 
> However, I suspect that may well be buggy if we have a cleanup path
> which unregisters the notifier when cpufreq_dev_count is decremented to
> zero...
> which we seem to have in cpufreq_cooling_unregister().
> 
> The answer may well be to have finer grained locking here:
> 
> - one lock to protect cpufreq_dev_list, which is only ever taken when
>   adding or removing entries from it
> 
> - a second lock to protect cpufreq_dev_count and the calls to
>   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> 
> and you would /never/ take either of those two locks inside each other.
> In other words:
> 
> 	mutex_lock(&cooling_list_lock);
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_list_lock);
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	if (cpufreq_dev_count++ == 0)
> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> and similar in the cleanup path.  The notifier itself would only ever
> take the cooling_list_lock.
> 

I agree with this approach, if its fine for others also, I can implement
and post patch.

Thanks,
Yadwinder

> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.