[PATCH v2] imx: thermal: use CPU temperature grade info for thresholds

Tim Harvey tharvey at gateworks.com
Tue Jul 28 07:50:51 PDT 2015


On Tue, May 26, 2015 at 11:08 PM, Jon Nettleton <jon.nettleton at gmail.com> wrote:
> On Tue, May 26, 2015 at 11:24 PM, Tim Harvey <tharvey at gateworks.com> wrote:
>> On Sat, May 23, 2015 at 10:19 PM, Jon Nettleton <jon.nettleton at gmail.com> wrote:
>>> On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo at linaro.org> wrote:
>>>> On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote:
>>>>> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which
>>>>> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and
>>>>> IMXSXRM do not document this - this has been proven via tests as well as
>>>>> verified by Freescale FAE).
>>>>>
>>>>> Instead of assuming a fixed 85C for passive cooling threshold and 105C for
>>>>> critical use the thermal grade for these configurations.
>>>>>
>>>>> We will set the critical to maxT - 5C and passive to maxT - 10C.
>>>
>>> I would like to chime in here if you don't mind.  I have been carrying
>>> a patch similar to this in the SolidRun repo to fix cooling issues
>>> that we have had.  I would recommend keeping the passive temp at maxT
>>> - 20C due to the thermal properties of the chip.  I have found that
>>> around 85-90C we can maintain a relatively steady thermal state with
>>> only passive cooling.  Generally with a hard non-NEON based cpu
>>> workload the iMX6 will level off at about 87C with all the cores
>>> clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically.
>>> With a NEON based workload on all the cores it will push beyond this
>>> and generally end up finding steady state at about 800Mhz right around
>>> 90C.
>>>
>>> If you raise the initial passive threshold by 10C it will allow enough
>>> heat to build up in the chip that the only way to avoid reaching
>>> critical temps is by dropping the CPU down to its lowest frequency.
>>> This is not the best experience as then you have a much warmer chip
>>> and if the workload doesn't change you will just be switching between
>>> running at the highest cpu frequency or lowest which makes for a
>>> choppy experience.  A longer passive cooling zone allows the
>>> temperature of the chip to be regulated using only passive methods but
>>> without drastic performance drops.
>>>
>>> I am doing things a bit differently in my implementation as I setup a
>>> passive cooling zone for each cpu frequency, but that is just so you
>>> can have more control from userspace by changing the different passive
>>> trip points.
>>>
>>> -Jon
>>
>> Jon,
>>
>> I can agree with leaving a Max-20C passive delta. What do you think
>> about the critical threshold of Max-5C and rule of not allowing it to
>> be changed?
>>
>
> Tim,
>
> I definitely agree that the Critical temp should be a fixed point.  Is
> the purpose of lowering the critical threshold from the hardware
> default, to allow Linux to shutdown more cleanly rather than just have
> the hardware shutting down?  If that is the case then I think that is
> fine.  If it is to protect the SOC then that is unnecessary.  We have
> heated the SOCs to well beyond the critical threshold and they have
> survived just fine.
>
> This is a bit out of context but here is the formula I am using to
> figure out my trip points.  By default I use a linear set of trip
> points for passive cooling.
> https://github.com/linux4kix/linux-linaro-stable-mx6/commit/212c17d543739a5fe0bd75b66c10f05177e8bcb0
>
> The short of it is I set a trip delta of 6C and then figure out the
> lowest passive trip point as Critical - (#passive trip points * trip
> delta), where each cpu frequency stage is a passive trip point.  This
> will allow an 800Mhz SOC with 2 trip points to run at full speed
> longer than a 1.2Ghz with 4 trip points.  The idea being that the
> higher the clock rate means we will generate more heat and have more
> passive cooling levels so it is better to drop the top speed of the
> CPU earlier in order to let the passive cooling be effective and find
> a steady state.
>
> This may be a bit over the top but has fixed problems where long
> running processes would build up heat and eventually cause a thermal
> shutdown, but doesn't completely cripple the faster SOCs.

Jon,

Yes - the purpose of lowering the critical threshold from the hardware
default is to allow Linux to shutdown more cleanly.

If you agree with the fact that the patch here offers the improvement
of using OTG temperature grade as a basis can you ack it and if you
feel that the thresholds need to be adjusted perhaps propose a
follow-on patch? I feel people can debate the temperature delta's
endlessly but what I was really after here was to fix the fact that
all the processors are not temperature graded equally because they are
packaged differently (metal case on automotive offering better thermal
conductivity vs plastic case on consumer)

Regards,

Tim



More information about the linux-arm-kernel mailing list