[PATCH v1] thermal: imx: Update critical temp threshold
Francesco Dolcini
francesco.dolcini at toradex.com
Thu May 12 03:24:54 PDT 2022
Hello Lucas,
On Thu, May 12, 2022 at 12:08:08PM +0200, Lucas Stach wrote:
> Am Donnerstag, dem 12.05.2022 um 09:36 +0200 schrieb Francesco Dolcini:
> > Hello Daniel, Sasha, Shawn and all
> >
> > On Mon, May 09, 2022 at 11:55:20AM +0200, Daniel Lezcano wrote:
> > > On 20/04/2022 11:13, Francesco Dolcini wrote:
> > > > Increase the critical temperature threshold to the datasheet defined
> > > > value according to the temperature grade of the SoC, increasing the
> > > > actual critical temperature value of 5 degrees.
> > > >
> > > > Without this change the emergency shutdown will trigger earlier then
> > > > required affecting applications that are expected to be working on this
> > > > close to the limit, but yet valid, temperature range.
> > > >
> > > > Signed-off-by: Francesco Dolcini <francesco.dolcini at toradex.com>
> > > > ---
> > > >
> > > > Not sure if there is an alternative to this patch, the critical threshold seems
> > > > to be read-only and it is not possible to just change it from user space that
> > > > would be my preferred solution.
> > > >
> > > > According to the original discussion [1] the reasoning was the following:
> > > >
> > > > On Tue, Jul 28, 2015 at 4:50 PM, Tim Harvey <tharvey at gateworks.com> wrote:
> > > > > Yes - the purpose of lowering the critical threshold from the hardware
> > > > > default is to allow Linux to shutdown more cleanly.
> > > >
> > > > But I do not understand it.
> > >
> > > Shawn, Sascha ? any comment ?
> >
> > Just one small addition, we (Toradex) are using this modified critical
> > threshold since quite some time, on multiple i.MX[67]* SOC, and we
> > regularly run stress tests on commercial/IT part on the whole
> > temperature working range (ambient temperature up to 85 degrees for IT
> > modules) in climate chambers and I'm not aware of any issue reported
> > because of that (indeed, it is the other way around, without this change
> > we had issues).
>
> That is really an overall system design issue. Most chips will probably
> work fine when going over the critical temperature, as this is mostly
> set due to device lifetime constraints, not because the chip fails at
> this temperature. However the chip is only guaranteed to work at up to
> the critical temperature, so one could argue that starting a orderly
> shutdown when the critical temperature is reached is already too late,
> as the temperature may rise further during the time taken to shut down
> the system. Also device leakage increases a lot at those critical
> temperatures, so the system may fail not because the chip is
> malfunctioning, but the board power supply may not be able to supply
> the increased current required.
>
> Really I think there is no right or wrong here. I believe that this
> needs to be up to the system integrator, so the critical temperature
> should be writable by userspace in the constraints set by the fuses.
I agree 95% with you. The 5% I do not agree is that the final system
integrator should be allowed to go even above the fuses constraints.
Sometime is better to take the chance of burning a chip than shutting
the system down.
Anyway, would it be fine to have a patch that make the critical
threshold write-able (in my initial message I mentioned this as my
preferred solution also)? If anybody has a pointer on how
to do it, it would be great, I'm not familiar with that code.
Francesco
More information about the linux-arm-kernel
mailing list