[PATCH v8 00/29] Rework the trip points creation

Daniel Lezcano daniel.lezcano at linaro.org
Mon Oct 3 14:18:07 PDT 2022


On 03/10/2022 16:10, Marek Szyprowski wrote:
> Hi Daniel,
> 
> On 03.10.2022 11:25, Daniel Lezcano wrote:
>> This work is the pre-requisite of handling correctly when the trip
>> point are crossed. For that we need to rework how the trip points are
>> declared and assigned to a thermal zone.
>>
>> Even if it appears to be a common sense to have the trip points being
>> ordered, this no guarantee neither documentation telling that is the
>> case.
>>
>> One solution could have been to create an ordered array of trips built
>> when registering the thermal zone by calling the different get_trip*
>> ops. However those ops receive a thermal zone pointer which is not
>> known as it is in the process of creating it.
>>
>> This cyclic dependency shows we have to rework how we manage the trip
>> points.
>>
>> Actually, all the trip points definition can be common to the backend
>> sensor drivers and we can factor out the thermal trip structure in all
>> of them.
>>
>> Then, as we register the thermal trips array, they will be available
>> in the thermal zone structure and a core function can return the trip
>> given its id.
>>
>> The get_trip_* ops won't be needed anymore and could be removed. The
>> resulting code will be another step forward to a self encapsulated
>> generic thermal framework.
>>
>> Most of the drivers can be converted more or less easily. This series
>> does a first round with most of the drivers. Some remain and will be
>> converted but with a smaller set of changes as the conversion is a bit
>> more complex.
>>
>> Changelog:
>> v8:
>> - Pretty oneline change and parenthesis removal (Rafael)
>> - Collected tags
>> v7:
>> - Added missing return 0 in the x86_pkg_temp driver
>> v6:
>> - Improved the code for the get_crit_temp() function as suggested by
>> Rafael
>> - Removed inner parenthesis in the set_trip_temp() function and invert the
>> conditions. Check the type of the trip point is unchanged
>> - Folded patch 4 with 1
>> - Add per thermal zone info message in the bang-bang governor
>> - Folded the fix for an uninitialized variable in
>> int340x_thermal_zone_add()
>> v5:
>> - Fixed a deadlock when calling thermal_zone_get_trip() while
>> handling the thermal zone lock
>> - Remove an extra line in the sysfs change
>> - Collected tags
>> v4:
>> - Remove extra lines on exynos changes as reported by Krzysztof Kozlowski
>> - Collected tags
>> v3:
>> - Reorg the series to be git-bisect safe
>> - Added the set_trip generic function
>> - Added the get_crit_temp generic function
>> - Removed more dead code in the thermal-of
>> - Fixed the exynos changelog
>> - Fixed the error check for the exynos drivers
>> - Collected tags
>> v2:
>> - Added missing EXPORT_SYMBOL_GPL() for thermal_zone_get_trip()
>> - Removed tab whitespace in the acerhdf driver
>> - Collected tags
>>
>> Cc: Raju Rangoju <rajur at chelsio.com>
>> Cc: "David S. Miller" <davem at davemloft.net>
>> Cc: Eric Dumazet <edumazet at google.com>
>> Cc: Jakub Kicinski <kuba at kernel.org>
>> Cc: Paolo Abeni <pabeni at redhat.com>
>> Cc: Peter Kaestle <peter at piie.net>
>> Cc: Hans de Goede <hdegoede at redhat.com>
>> Cc: Mark Gross <markgross at kernel.org>
>> Cc: Miquel Raynal <miquel.raynal at bootlin.com>
>> Cc: "Rafael J. Wysocki" <rafael at kernel.org>
>> Cc: Daniel Lezcano <daniel.lezcano at linaro.org>
>> Cc: Amit Kucheria <amitk at kernel.org>
>> Cc: Zhang Rui <rui.zhang at intel.com>
>> Cc: Nicolas Saenz Julienne <nsaenz at kernel.org>
>> Cc: Broadcom Kernel Team <bcm-kernel-feedback-list at broadcom.com>
>> Cc: Florian Fainelli <f.fainelli at gmail.com>
>> Cc: Ray Jui <rjui at broadcom.com>
>> Cc: Scott Branden <sbranden at broadcom.com>
>> Cc: Support Opensource <support.opensource at diasemi.com>
>> Cc: Lukasz Luba <lukasz.luba at arm.com>
>> Cc: Shawn Guo <shawnguo at kernel.org>
>> Cc: Sascha Hauer <s.hauer at pengutronix.de>
>> Cc: Pengutronix Kernel Team <kernel at pengutronix.de>
>> Cc: Fabio Estevam <festevam at gmail.com>
>> Cc: NXP Linux Team <linux-imx at nxp.com>
>> Cc: Thara Gopinath <thara.gopinath at linaro.org>
>> Cc: Andy Gross <agross at kernel.org>
>> Cc: Bjorn Andersson <bjorn.andersson at linaro.org>
>> Cc: "Niklas Söderlund" <niklas.soderlund at ragnatech.se>
>> Cc: Bartlomiej Zolnierkiewicz <bzolnier at gmail.com>
>> Cc: Krzysztof Kozlowski <krzysztof.kozlowski at linaro.org>
>> Cc: Alim Akhtar <alim.akhtar at samsung.com>
>> Cc: Thierry Reding <thierry.reding at gmail.com>
>> Cc: Jonathan Hunter <jonathanh at nvidia.com>
>> Cc: Eduardo Valentin <edubezval at gmail.com>
>> Cc: Keerthy <j-keerthy at ti.com>
>> Cc: Kunihiko Hayashi <hayashi.kunihiko at socionext.com>
>> Cc: Masami Hiramatsu <mhiramat at kernel.org>
>> Cc: Antoine Tenart <atenart at kernel.org>
>> Cc: Srinivas Pandruvada <srinivas.pandruvada at linux.intel.com>
>> Cc: Dmitry Osipenko <digetx at gmail.com>
>> Cc: netdev at vger.kernel.org
>> Cc: linux-kernel at vger.kernel.org
>> Cc: platform-driver-x86 at vger.kernel.org
>> Cc: linux-pm at vger.kernel.org
>> Cc: linux-rpi-kernel at lists.infradead.org
>> Cc: linux-arm-kernel at lists.infradead.org
>> Cc: linux-arm-msm at vger.kernel.org
>> Cc: linux-renesas-soc at vger.kernel.org
>> Cc: linux-samsung-soc at vger.kernel.org
>> Cc: linux-tegra at vger.kernel.org
>> Cc: linux-omap at vger.kernel.org
>>
>> Daniel Lezcano (29):
>> thermal/core: Add a generic thermal_zone_get_trip() function
>> thermal/sysfs: Always expose hysteresis attributes
>> thermal/core: Add a generic thermal_zone_set_trip() function
>> thermal/core/governors: Use thermal_zone_get_trip() instead of ops
>> functions
>> thermal/of: Use generic thermal_zone_get_trip() function
>> thermal/of: Remove unused functions
>> thermal/drivers/exynos: Use generic thermal_zone_get_trip() function
>> thermal/drivers/exynos: of_thermal_get_ntrips()
>> thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by
>> thermal_zone_get_trip()
>> thermal/drivers/tegra: Use generic thermal_zone_get_trip() function
>> thermal/drivers/uniphier: Use generic thermal_zone_get_trip() function
>> thermal/drivers/hisi: Use generic thermal_zone_get_trip() function
>> thermal/drivers/qcom: Use generic thermal_zone_get_trip() function
>> thermal/drivers/armada: Use generic thermal_zone_get_trip() function
>> thermal/drivers/rcar_gen3: Use the generic function to get the number
>> of trips
>> thermal/of: Remove of_thermal_get_ntrips()
>> thermal/of: Remove of_thermal_is_trip_valid()
>> thermal/of: Remove of_thermal_set_trip_hyst()
>> thermal/of: Remove of_thermal_get_crit_temp()
>> thermal/drivers/st: Use generic trip points
>> thermal/drivers/imx: Use generic thermal_zone_get_trip() function
>> thermal/drivers/rcar: Use generic thermal_zone_get_trip() function
>> thermal/drivers/broadcom: Use generic thermal_zone_get_trip() function
>> thermal/drivers/da9062: Use generic thermal_zone_get_trip() function
>> thermal/drivers/ti: Remove unused macros ti_thermal_get_trip_value() /
>> ti_thermal_trip_is_valid()
>> thermal/drivers/acerhdf: Use generic thermal_zone_get_trip() function
>> thermal/drivers/cxgb4: Use generic thermal_zone_get_trip() function
>> thermal/intel/int340x: Replace parameter to simplify
>> thermal/drivers/intel: Use generic thermal_zone_get_trip() function
> 
> I've tested this v8 patchset after fixing the issue with Exynos TMU with
> https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/
> patch and I got the following lockdep warning on all Exynos-based boards:
> 
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted
> ------------------------------------------------------
> swapper/0/1 is trying to acquire lock:
> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8
> 
> but task is already holding lock:
> c2979b94 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_device_update.part.0+0x3c/0x528
> 
> which lock already depends on the new lock.

I'm wondering if the problem is not already there and related to 
data->lock ...

Doesn't the thermal zone lock already prevent racy access to the data 
structure?

Another question: if the sensor clock is disabled after reading it, how 
does the hardware update the temperature and detect the programed 
threshold is crossed?

> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&tz->lock){+.+.}-{3:3}:
>          mutex_lock_nested+0x1c/0x24
>          thermal_zone_get_trip+0x20/0x44
>          exynos_tmu_initialize+0x144/0x1e0
>          exynos_tmu_probe+0x2b0/0x728
>          platform_probe+0x5c/0xb8
>          really_probe+0xe0/0x414
>          __driver_probe_device+0xa0/0x208
>          driver_probe_device+0x30/0xc0
>          __driver_attach+0xf0/0x1f0
>          bus_for_each_dev+0x70/0xb0
>          bus_add_driver+0x174/0x218
>          driver_register+0x88/0x11c
>          do_one_initcall+0x64/0x380
>          kernel_init_freeable+0x1c0/0x224
>          kernel_init+0x18/0x12c
>          ret_from_fork+0x14/0x2c
>          0x0
> 
> -> #0 (&data->lock#2){+.+.}-{3:3}:
>          lock_acquire+0x124/0x3e4
>          __mutex_lock+0x90/0x948
>          mutex_lock_nested+0x1c/0x24
>          exynos_get_temp+0x3c/0xc8
>          __thermal_zone_get_temp+0x5c/0x12c
>          thermal_zone_device_update.part.0+0x78/0x528
>          __thermal_cooling_device_register.part.0+0x298/0x354
>          __cpufreq_cooling_register.constprop.0+0x138/0x218
>          of_cpufreq_cooling_register+0x48/0x8c
>          cpufreq_online+0x8d0/0xb2c
>          cpufreq_add_dev+0xb0/0xec
>          subsys_interface_register+0x108/0x118
>          cpufreq_register_driver+0x15c/0x380
>          dt_cpufreq_probe+0x2e4/0x434
>          platform_probe+0x5c/0xb8
>          really_probe+0xe0/0x414
>          __driver_probe_device+0xa0/0x208
>          driver_probe_device+0x30/0xc0
>          __driver_attach+0xf0/0x1f0
>          bus_for_each_dev+0x70/0xb0
>          bus_add_driver+0x174/0x218
>          driver_register+0x88/0x11c
>          do_one_initcall+0x64/0x380
>          kernel_init_freeable+0x1c0/0x224
>          kernel_init+0x18/0x12c
>          ret_from_fork+0x14/0x2c
>          0x0
> 
> other info that might help us debug this:
> 
>    Possible unsafe locking scenario:
> 
>          CPU0                    CPU1
>          ----                    ----
>     lock(&tz->lock);
>                                  lock(&data->lock#2);
>                                  lock(&tz->lock);
>     lock(&data->lock#2);
> 
>    *** DEADLOCK ***
> 
> 5 locks held by swapper/0/1:
>    #0: c1c8648c (&dev->mutex){....}-{3:3}, at: __driver_attach+0xe4/0x1f0
>    #1: c1210434 (cpu_hotplug_lock){++++}-{0:0}, at:
> cpufreq_register_driver+0xc4/0x380
>    #2: c1ed8298 (subsys mutex#8){+.+.}-{3:3}, at:
> subsys_interface_register+0x4c/0x118
>    #3: c131f944 (thermal_list_lock){+.+.}-{3:3}, at:
> __thermal_cooling_device_register.part.0+0x238/0x354
>    #4: c2979b94 (&tz->lock){+.+.}-{3:3}, at:
> thermal_zone_device_update.part.0+0x3c/0x528
> 
> stack backtrace:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00083-ge5c9d117223e
> #12945
> Hardware name: Samsung Exynos (Flattened Device Tree)
>    unwind_backtrace from show_stack+0x10/0x14
>    show_stack from dump_stack_lvl+0x58/0x70
>    dump_stack_lvl from check_noncircular+0xf0/0x158
>    check_noncircular from __lock_acquire+0x15e8/0x2a7c
>    __lock_acquire from lock_acquire+0x124/0x3e4
>    lock_acquire from __mutex_lock+0x90/0x948
>    __mutex_lock from mutex_lock_nested+0x1c/0x24
>    mutex_lock_nested from exynos_get_temp+0x3c/0xc8
>    exynos_get_temp from __thermal_zone_get_temp+0x5c/0x12c
>    __thermal_zone_get_temp from thermal_zone_device_update.part.0+0x78/0x528
>    thermal_zone_device_update.part.0 from
> __thermal_cooling_device_register.part.0+0x298/0x354
>    __thermal_cooling_device_register.part.0 from
> __cpufreq_cooling_register.constprop.0+0x138/0x218
>    __cpufreq_cooling_register.constprop.0 from
> of_cpufreq_cooling_register+0x48/0x8c
>    of_cpufreq_cooling_register from cpufreq_online+0x8d0/0xb2c
>    cpufreq_online from cpufreq_add_dev+0xb0/0xec
>    cpufreq_add_dev from subsys_interface_register+0x108/0x118
>    subsys_interface_register from cpufreq_register_driver+0x15c/0x380
>    cpufreq_register_driver from dt_cpufreq_probe+0x2e4/0x434
>    dt_cpufreq_probe from platform_probe+0x5c/0xb8
>    platform_probe from really_probe+0xe0/0x414
>    really_probe from __driver_probe_device+0xa0/0x208
>    __driver_probe_device from driver_probe_device+0x30/0xc0
>    driver_probe_device from __driver_attach+0xf0/0x1f0
>    __driver_attach from bus_for_each_dev+0x70/0xb0
>    bus_for_each_dev from bus_add_driver+0x174/0x218
>    bus_add_driver from driver_register+0x88/0x11c
>    driver_register from do_one_initcall+0x64/0x380
>    do_one_initcall from kernel_init_freeable+0x1c0/0x224
>    kernel_init_freeable from kernel_init+0x18/0x12c
>    kernel_init from ret_from_fork+0x14/0x2c
> Exception stack(0xf082dfb0 to 0xf082dff8)
> ...
> 
> Let me know if You need anything more to test.
> 
> 
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 -
>> .../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 41 +----
>> drivers/platform/x86/acerhdf.c | 73 +++-----
>> drivers/thermal/armada_thermal.c | 39 ++---
>> drivers/thermal/broadcom/bcm2835_thermal.c | 8 +-
>> drivers/thermal/da9062-thermal.c | 52 +-----
>> drivers/thermal/gov_bang_bang.c | 39 +++--
>> drivers/thermal/gov_fair_share.c | 18 +-
>> drivers/thermal/gov_power_allocator.c | 51 +++---
>> drivers/thermal/gov_step_wise.c | 22 ++-
>> drivers/thermal/hisi_thermal.c | 11 +-
>> drivers/thermal/imx_thermal.c | 72 +++-----
>> .../int340x_thermal/int340x_thermal_zone.c | 33 ++--
>> .../int340x_thermal/int340x_thermal_zone.h | 4 +-
>> .../processor_thermal_device.c | 10 +-
>> drivers/thermal/intel/x86_pkg_temp_thermal.c | 120 +++++++------
>> drivers/thermal/qcom/qcom-spmi-temp-alarm.c | 39 ++---
>> drivers/thermal/rcar_gen3_thermal.c | 2 +-
>> drivers/thermal/rcar_thermal.c | 53 +-----
>> drivers/thermal/samsung/exynos_tmu.c | 57 +++----
>> drivers/thermal/st/st_thermal.c | 47 +----
>> drivers/thermal/tegra/soctherm.c | 33 ++--
>> drivers/thermal/tegra/tegra30-tsensor.c | 17 +-
>> drivers/thermal/thermal_core.c | 160 +++++++++++++++---
>> drivers/thermal/thermal_core.h | 24 +--
>> drivers/thermal/thermal_helpers.c | 28 +--
>> drivers/thermal/thermal_netlink.c | 21 +--
>> drivers/thermal/thermal_of.c | 116 -------------
>> drivers/thermal/thermal_sysfs.c | 133 +++++----------
>> drivers/thermal/ti-soc-thermal/ti-thermal.h | 15 --
>> drivers/thermal/uniphier_thermal.c | 27 ++-
>> include/linux/thermal.h | 10 ++
>> 32 files changed, 559 insertions(+), 818 deletions(-)
>>
> Best regards
> 


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog



More information about the linux-arm-kernel mailing list