3.18: lockdep problems in cpufreq
Yadwinder Singh Brar
yadi.brar at samsung.com
Mon Dec 15 06:54:05 PST 2014
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Monday, December 15, 2014 7:17 PM
> To: Yadwinder Singh Brar
> Cc: 'Viresh Kumar'; 'Rafael J. Wysocki'; linux-arm-
> kernel at lists.infradead.org; linux-pm at vger.kernel.org; 'Eduardo
> Valentin'
> Subject: Re: 3.18: lockdep problems in cpufreq
>
> On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> > Unfortunately, I didn’t get any such warning though I tested patch
> > enabling CONFIG_PROVE_LOCKING before posting. It seems Russell is
> > trying to load imx_thermal as module and parallely Changing cpufreq
> > governor from userspace, which was not my test case.
>
> No. Yes, imx_thermal is a module, which is loaded by udev very early
> in the userspace boot. Later on, in the rc.local, I poke the governor
> from userspace.
>
> This is evidenced by the bluetooth modules being loaded after
> imx_thermal:
>
> Module Size Used by
> bnep 10574 2
> rfcomm 33720 0
> bluetooth 293598 10 bnep,rfcomm
> nfsd 90264 0
> exportfs 3988 1 nfsd
> hid_cypress 1763 0
> snd_soc_fsl_spdif 9753 2
> imx_pcm_dma 1137 1 snd_soc_fsl_spdif
> imx_sdma 12885 2
> imx2_wdt 3248 0
> imx_thermal 5386 0
> snd_soc_imx_spdif 1877 0
>
> and the timestamp on the bluetooth message:
>
> [ 25.503739] Bluetooth: Core ver 2.19
>
> vs the timestamp on the lockdep message:
>
> [ 29.499195] [ INFO: possible circular locking dependency detected ]
>
> My rc.local does this:
>
> # Configure cpufreq
> echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
> echo performance >
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>
> > Anyways, after analyzing log and code,I think problem is not in
> > cpufreq_thermal_notifier which was modified in patch as stated above.
> > Actual problem is in __cpufreq_cooling_register which is
> unnecessarily
> > calling cpufreq_register_notifier() inside section protected by
> > cooling_cpufreq_lock.
> > Because cpufreq_policy_notifier_list).rwsem is already held by
> > store_scaling_governor when __cpufreq_cooling_register is trying to
> > cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.
> >
> > So I think following can fix the problem:
> >
> > diff --git a/drivers/thermal/cpu_cooling.c
> > b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node
> *np,
> > cpufreq_dev->cpufreq_state = 0;
> > mutex_lock(&cooling_cpufreq_lock);
> >
> > - /* Register the notifier for first cpufreq cooling device */
> > - if (cpufreq_dev_count == 0)
> > -
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > - CPUFREQ_POLICY_NOTIFIER);
> > cpufreq_dev_count++;
> > list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> >
> > mutex_unlock(&cooling_cpufreq_lock);
> >
> > + /* Register the notifier for first cpufreq cooling device */
> > + if (cpufreq_dev_count == 0)
> > +
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > + CPUFREQ_POLICY_NOTIFIER);
>
> No, sorry, this patch is worse.
>
Actually it was not patch :P , just moved cpufreq_register_notifier()
out of locking, as it is(C&P) to explain my point.
> 1. cpufreq_register_notifier() will never be called, even on the first
> caller to this code: at the point where it's tested for zero, it
> will
> be incremented to one by the previous code.
>
> 2. What happens if two threads come here?
>
> The first thread succeeds in grabbing cooling_cpufreq_lock. The
> second
> thread stalls waiting for cooling_cpufreq_lock to be released.
>
> The first thread increments cpufreq_dev_count, adds to the list, and
> then
> releases the lock. At this point, control may be passed to the
> second
> thread (since mutex_unlock() will wake it up.) The second thread
> gets
> into the critical region, and increments cpufreq_dev_count again.
>
> By the time the first thread runs, cpufreq_dev_count may be two at
> this
> point.
Yes, may be.
>
> Unfortunately, you do need some kind of synchronisation here. If it's
> not important when cpufreq_register_notifier() gets called, maybe this
> can work:
>
> bool register;
>
> mutex_lock(&cooling_cpufreq_lock);
> register = cpufreq_dev_count++ == 0;
> list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> mutex_unlock(&cooling_cpufreq_lock);
>
> if (register)
register may be 0 in scenario you stated above in second point.
So this approach will not work.
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> CPUFREQ_POLICY_NOTIFIER);
>
> However, I suspect that may well be buggy if we have a cleanup path
> which unregisters the notifier when cpufreq_dev_count is decremented to
> zero...
> which we seem to have in cpufreq_cooling_unregister().
>
> The answer may well be to have finer grained locking here:
>
> - one lock to protect cpufreq_dev_list, which is only ever taken when
> adding or removing entries from it
>
> - a second lock to protect cpufreq_dev_count and the calls to
> cpufreq_register_notifier() and cpufreq_unregister_notifier()
>
> and you would /never/ take either of those two locks inside each other.
> In other words:
>
> mutex_lock(&cooling_list_lock);
> list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> mutex_unlock(&cooling_list_lock);
>
> mutex_lock(&cooling_cpufreq_lock);
> if (cpufreq_dev_count++ == 0)
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> CPUFREQ_POLICY_NOTIFIER);
> mutex_unlock(&cooling_cpufreq_lock);
>
> and similar in the cleanup path. The notifier itself would only ever
> take the cooling_list_lock.
>
I agree with this approach, if its fine for others also, I can implement
and post patch.
Thanks,
Yadwinder
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.
More information about the linux-arm-kernel
mailing list