[PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

Fri Mar 1 05:11:07 PST 2024

Hello Chen-Yu,

On 2024-03-01 13:02, Chen-Yu Tsai wrote:
> On Fri, Mar 1, 2024 at 7:10 PM Alexey Charkov <alchark at gmail.com> 
> wrote:
>> On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <dsimic at manjaro.org> 
>> wrote:
>> > On 2024-03-01 09:25, Alexey Charkov wrote:
>> > > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <dsimic at manjaro.org> wrote:
>> > Thus, who knows what might (or might not) go wrong if we don't reset the
>> > PMIC at the same time when the CRU resets the SoC?  Unfortunately, the
>> > things aren't that straightforward.
>> >
>> > On top of that, some boards, such as the Rock 5B, use a few additional
>> > discrete voltage regulators instead of a master-slave PMIC
>> > configuration,
>> > which may actually introduce some weird power-related issues, which also
>> > may be intermittent.  Actually, I've already overheard that the Rock 5B
>> > experiences some issues of that nature, but I don't know the details.
>> 
>> Those discrete regulators seem to be out of scope of this discussion.
>> 
>> I agree that a deeper power-cycle with proper power-up sequence to
>> follow it is better when it's available in the respective hardware.
>> I'm also happy to provide a follow-up patch to switch from CRU to PMIC
>> resets for the boards I found to support the latter.
>> 
>> The question we have at hand is solely about the default behavior for
>> a hypothetical new board with minimal .dts, or an existing board where
>> we can't determine the wiring of the TSHUT signal:
>> Option 1. Let them stay nice and warm at 120C+ under load, because
>> they should have known better and should have enabled the TSADC in
>> their device tree before putting the system under load
>> Option 2. Get them passively cooled at 85C under load even with no
>> heatsink, then force a CRU reset out of abundance of caution at 120C
>> unless they defined PMIC reset in their device tree
>> 
>> I'm advocating for the latter.
> 
> FWIW, the CRU reset is what the kernel uses for rebooting the system,
> either during a reboot or a kernel panic. So it is already used for 
> both
> normal and abnormal scenarios. And yes, it sometimes leaves regulators
> or other parts of the system in some weird state that the BROM isn't
> expecting.

According to drivers/mfd/rk8xx-core.c, some PMICs (RK809 and RK817, to
be precise) already support taking over the board resets when configured
with "rockchip,system-power-controller".  Perhaps we should do the same
with the RK806, to avoid any possible issues with CRU-based board 
resets;
I'll see to investigate that further.

Not all Rockchip PMICs (RK808, for example) support software-initiated
resets, unfortunately.  According to the RK806 datasheet, it seems 
capable
of that;  see pages 27 and 28 in the version 1.0 of the datasheet.

> Why should a hardware triggered reset be any different?

According to the RK806 datasheet, resetting through PMIC(s) causes the
PMIC(s) to cut the power rails in a controlled way, i.e. with the 
expected
ramp-downs and sequencing, and the SoC then wakes up in a state of the
regulators that's exactly the same as when it gets powered up on cold 
boot.
Doing it that way should be better.

The reset procedure _should_ be virtually the same for all Rockchip 
PMICs,
but please don't take my word on that.  Resets are described quite 
poorly
in some PMIC datasheets.