[PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Dragan Simic dsimic at manjaro.org
Fri Mar 1 01:24:01 PST 2024


On 2024-03-01 09:52, Dragan Simic wrote:
> On 2024-03-01 09:25, Alexey Charkov wrote:
>> On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <dsimic at manjaro.org> 
>> wrote:
>>> On 2024-03-01 06:12, Alexey Charkov wrote:
>>> > With all due respect, I disagree, here is why:
>>> >  - Neither the schematic nor the hardware design guide, on which the
>>> > schematic seems to be based, prescribes a particular way to handle
>>> > thermal runaways. They only provide the possibility of GPIO based
>>> > resets, along with the CRU based one
>>> 
>>> Please note that other documents from Rockchip also exist.  Below is
>>> a link to a screenshot from the Thermal developer guide, version 1.0,
>>> which describes the whole thing further.  I believe it's obvious that
>>> the thermal runaway is to be treated as a board-level feature.
>>> 
>>> - https://i.imgur.com/IJ6dSAc.png
>> 
>> Frankly, that still doesn't make TSADC per se a board-level thing IMO.
>> The only thing that is board-level is the wiring of GPIO based resets,
>> which I fully agree should go to board .dts for boards that support
>> it, but that's not part of the current defaults and can be safely
>> added later.
>> 
>> TSADC is inside the SoC. CRU is inside the SoC. They work just fine
>> for a thermal reset, even if no dedicated reset logic is wired on the
>> board. I really don't see any downsides in having TSADC enabled by
>> default with CRU based resets:
>> - it's a safe default (i.e. I cannot think of any configuration or use
>> case where enabled-by-default TSADC does any harm)
>> - it's safer than accidentally forgetting to enable TSADC (as it adds
>> thermal protection which is otherwise missing)
>> - it will work on all boards (even if it doesn't utilize the full
>> hardware functionality by ignoring GPIO resets that some boards also
>> have in addition to the CRU)
>> - and it requires fewer overrides in board .dts files
>> 
>> Sounds like a no-regret move to me.
> 
> Please see my comments below.
> 
>>> To be fair, that version of the Thermal developer guide dates back to
>>> 2019, meaning that it technically applies to the RK3399, for example,
>>> but the TSADC and reset circuitry design has basically remained the
>>> same for the RK3588.
>>> 
>>> >  - My strong belief is that defaults (regardless of context) should be
>>> > safe and reasonable, and should also minimize the need to override
>>> > them
>>> 
>>> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so 
>>> having
>>> it disabled in the RK3588(s) SoC dtsi would provide some consistency.
>> 
>> I'm happy to produce a patch to reverse the logic in RK3399 (and any
>> others for that matter) to also have TSADC enabled by default there,
>> thus saving several lines of code, if it's just about consistency.
> 
> But why should we change something that has served us for years, on
> multiple SoCs, with zero troubles and with (AFAIK) zero boards 
> producing
> puffs of bluish smoke?
> 
>>> Though, the RK3399 still does it in a safe way, by moving the OPPs 
>>> into
>>> a separate dtsi file, named rk3399-opp.dtsi, which the board dts 
>>> files
>>> then include together with enabling the TSADC.
>>> 
>>> If you agree, let's employ the same approach for the RK3588(s), by
>>> having
>>> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.
>> 
>> Separate file for OPPs is a good no-regret move to declutter the SoC
>> level .dtsi (as the OPP table is long and boring) - happy to move it
>> regardless of the outcome of the above TSADC discussion. Thanks for
>> the pointer!
> 
> Yeah, but I'm not sure that everyone would like that kind of 
> separation.
> In fact, such separation may be frowned upon unless it's necessary.
> 
> As I already described in another thread, the separation for the RK3399
> is there only because a couple of different variants of the RK3399 SoC
> require different OPPs.
> 
>>> >  - In context of dts/dtsi, as far as I understand the general logic
>>> > behind the split, the SoC .dtsi should contain all the things that are
>>> > fully contained within the SoC and do not depend on the wiring of a
>>> > particular board or its target use case. Boards then
>>> > add/remove/override settings to match their wiring and use case more
>>> > closely
>>> 
>>> Of course, but the thermal shutdown is obviously a board-level 
>>> feature,
>>> which I described further above.
>> 
>> Not so obvious to me :-) I don't mean to be stubborn or uncooperative
>> here, but I really can't find any technical merit in having it enabled
>> at board level instead of SoC level.
> 
> Well, please also consider that the PMICs from Rockchip are kind of
> weird little chips, specifically customized to serve particular SoCs.
> For example, they ensure the right sequencing and ramping-up of 
> different
> power rails, which is in many cases essential.
> 
> Thus, who knows what might (or might not) go wrong if we don't reset 
> the
> PMIC at the same time when the CRU resets the SoC?  Unfortunately, the
> things aren't that straightforward.
> 
> On top of that, some boards, such as the Rock 5B, use a few additional
> discrete voltage regulators instead of a master-slave PMIC 
> configuration,
> which may actually introduce some weird power-related issues, which 
> also
> may be intermittent.  Actually, I've already overheard that the Rock 5B
> experiences some issues of that nature, but I don't know the details.

As an example, did you know that LPDDR4 chips, according to the official
JEDEC documentation, require proper sequencing of the ramping-down of 
their
power rails when they're to be turned off as part of shutting the system
down?  The documentation also specifies that the expected lifetime 
becomes
reduced when the powering-off isn't properly performed, and there's even 
an
official number of such unsafe power-offs that the LPDDR4 chips are 
actually
expected to survive.

Thus, just yanking a power cord from a device that uses LPDDR4 may 
actually
make it die prematurely.  Such behavior is kind of exected when it comes 
to
flash-based storage, but DRAM?  Things are weird these days. :)



More information about the linux-arm-kernel mailing list