Under some conditions i.MX temperature is not read out from on-SOC temperature sensor

Philipp Zabel p.zabel at pengutronix.de
Fri Jul 21 03:30:47 PDT 2017


Hi Maxim,

thank you for the patch and the analysis. I have some comments below.

On Fri, 2017-07-21 at 12:30 +0300, Maxim Yu. Osipov wrote:
> Hello, Shawn, Sascha,
> 
> We use yocto (with mainline kernel 4.4.18) for our i.MX 6SOLO-based 
> custom board. We use in-kernel thermal management framework with
> default thermal governor (step_wise) (drivers/thermal/imx-thermal.c) 
> with defined trip points:
> 
> imx_thermal 2000000.aips-bus:tempmon: Commercial CPU temperature grade - 
> max:95C critical:90C passive:85C
> 
> We heat up the board in climate chamber until temperature reaches 
> critical value (90C) and in-kernel thermal management powers it off. 
> After short period of time (when temperature is in range 
> passive(85C)...critical(90C), we power up the board again so the alarm 
> condition is met and imx_thermal interrupt fires up. When we try to read 
> out the temperature from corresponding sysfs file we permanently get the 
> error -EAGAIN:
> 
> root at mybox:/usr/lib/strace/ptest# ./strace cat 
> /sys/class/thermal/thermal_zone0/temp
> ...
> open("/sys/class/thermal/thermal_zone0/temp", O_RDONLY|O_LARGEFILE) = 3
> sendfile64(1, 3, NULL, 16777216)        = -1 EAGAIN (Resource 
> temporarily unavailable)
> read(3, 0x7ee5bc00, 4096)               = -1 EAGAIN (Resource 
> temporarily unavailable)
> brk(NULL)                               = 0x1ad9000
> brk(0x1afa000)                          = 0x1afa000
> write(2, "cat: read error: Resource tempor"..., 50cat: read error: 
> Resource temporarily unavailable
> 
> root at mybox:~# cat /proc/interrupts
>       CPU0        <snip>
>       271:          2       GPC  49 Level     imx_thermal
>       <snip>
> 
> There are a couple of workarounds to enforce the temperature file to be 
> readable. If we explicitly enable the thermal's mode via sysfs (echo 
> enabled > /sys/sys/class/therma/thermal_zone0/mode) the temperature file 
> becomes readable. The same applies to suspend/resume cycles - after 
> resume the temperature file is readable.
> 
> 
> Having analyzed/debugged the code (for both mainline and freescale's 
> trees) I figured out the reason of the problem:
> 
> In imx_thermal_probe() thermal alarm interrupt is enabled before 
> device's 'mode' field is set to THERMAL_DEVICE_ENABLED while the sensor 
> hardware is already powered up. If alarm condition is met -
> the interrupt immediately fires up. During (threaded) interrupt 
> processing imx_get_temp() is called. The field 'mode' is still set to 
> DISABLED, so imx_get_temp() processes such case by special way: sensor 
> is powered up,
> measurement is enabled, a reading is taken, after that measurement is 
> DISABLED and temperature sensor is POWERED DOWN.
> When processing of alarm interrupt ends, imx_thermal_probe() continues 
> and sets mode field to ENABLED, but in fact the device is powered off!
> 
> This leads to broken logic of further calls of imx_get_temp().
> 
> The consequences of this bug could be quite serious - the temperature is 
> not read out from the sensor, so in-kernel thermal management is useless 
> - the board is not powered off by thermal management when the CPU is 
> overheated.
> 
> 
> Attached is patch against current mainline kernel tree.

It would be preferable to have the patch sent inline. That way it would
be easier to comment on specifics. Also, as a thermal patch, this should
be sent to linux-pm at vger.kernel.org. The scripts/get_maintainers.pl
script in the kernel sources can help to find the relevant maintainers
and mailing lists.

I think the data->irq_enabled assignment should be moved up before the
call to devm_request_threaded_irq as well. If the interrupt triggers
immediately (and sets data->irq_enabled=false), and then
imx_thermal_probe returns (after setting data->irq_enabled=true) before
the threaded irq handler gets to run, imx_get_temp will not reenable the
interrupt at the end, if the temperature has just fallen below the alarm
temperature.

Reviewed-by: Philipp Zabel <p.zabel at pengutronix.de>

best regards
Philipp




More information about the linux-arm-kernel mailing list