[PATCH] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588

Dragan Simic dsimic at manjaro.org
Sun Jan 21 22:22:48 PST 2024


On 2024-01-22 07:03, Alexey Charkov wrote:
> On Mon, Jan 22, 2024 at 8:55 AM Dragan Simic <dsimic at manjaro.org> 
> wrote:
>> On 2024-01-21 19:56, Alexey Charkov wrote:
>> > On Thu, Jan 18, 2024 at 10:48 PM Dragan Simic <dsimic at manjaro.org> wrote:
>> >> On 2024-01-08 14:41, Alexey Charkov wrote:
>> >> > On Sun, Jan 7, 2024 at 2:54 AM Dragan Simic <dsimic at manjaro.org> wrote:
>> >> >> On 2024-01-06 23:23, Alexey Charkov wrote:
>> >> >> > Include thermal zones information in device tree for rk3588 variants
>> >> >> > and enable the built-in thermal sensing ADC on RADXA Rock 5B
>> >> >> >
>> >> >> > Signed-off-by: Alexey Charkov <alchark at gmail.com>
>> >> >> > ---
>> >> >> > +                     trips {
>> >> >> > +                             threshold: trip-point-0 {
>> >> >>
>> >> >> It should be better to name it cpu_alert0 instead, because that's what
>> >> >> other newer dtsi files already use.
>> >> >
>> >> > Reflecting on your comments here and below, I'm thinking that maybe it
>> >> > would be better to define only the critical trip point for the SoC
>> >> > overall, and then have alerts along with the respective cooling maps
>> >> > separately for A76-0,1, A76-2,3, A55-0,1,2,3? After all, given that we
>> >> > have more granular temperature measurement here than in previous RK
>> >> > chipsets it might be better to only throttle the "offending" cores,
>> >> > not the full package.
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Downstream DT doesn't follow this approach though, so maybe there's
>> >> > something I'm missing here.
>> >>
>> >> I agree, it's better to fully utilize the higher measurement
>> >> granularity
>> >> made possible by having multiple temperature sensors available.
>> >>
>> >> I also agree that we should have only the critical trip defined for
>> >> the
>> >> package-level temperature sensor.  Let's have the separate temperature
>> >> measurements for the CPU (sub)clusters do the thermal throttling, and
>> >> let's keep the package-level measurement for the critical shutdowns
>> >> only.  IIRC, some MediaTek SoC dtsi already does exactly that.
>> >>
>> >> Of course, there are no reasons not to have the critical trips defined
>> >> for the CPU (sub)clusters as well.
>> >
>> > I think I'll also add a board-specific active cooling mechanism on the
>> > package level in the next iteration, given that Rock 5B has a PWM fan
>> > defined as a cooling device. That will go in the separate patch that
>> > updates rk3588-rock-5b.dts (your feedback to v2 of this patch is also
>> > duly noted, thank you!)
>> 
>> Great, thanks.  Sure, making use of the Rock 5B's support for 
>> attaching
>> a PWM-controlled cooling fan is the way to go.
>> 
>> Just to reiterate a bit, any "active" trip points belong to the board
>> dts file(s), because having a cooling fan is a board-specific feature.
>> As a note, you may also want to have a look at the RockPro64 dts(i)
>> files, for example;  the RockPro64 also comes with a cooling fan
>> connector and the associated PWM fan control logic.
> 
> Thanks for the pointer! There is also a helpful doc within devicetree
> bindings descriptions, although it sits under hwmon which was a bit
> confusing to me. I've already tested it locally (by adding to the
> board dts), and it spins up and down quite nicely, and even modulates
> the fan speed swiftly when the load changes - yay!

Nice!  Also, isn't it like magic? :)  To me, turning LEDs on/off and
controlling fans acts as some kind of a "bridge" between the virtual
and the real world. :)

As a suggestion, it would be good to test with a couple of different
fans, to make sure that the PWM values work well for more that one fan
model.  The Rock 5B requires a 5 V fan, if I'm not mistaken?

>> >> >> > +                                     temperature = <75000>;
>> >> >> > +                                     hysteresis = <2000>;
>> >> >> > +                                     type = "passive";
>> >> >> > +                             };
>> >> >> > +                             target: trip-point-1 {
>> >> >>
>> >> >> It should be better to name it cpu_alert1 instead, because that's what
>> >> >> other newer dtsi files already use.
>> >> >>
>> >> >> > +                                     temperature = <85000>;
>> >> >> > +                                     hysteresis = <2000>;
>> >> >> > +                                     type = "passive";
>> >> >> > +                             };
>> >> >> > +                             soc_crit: soc-crit {
>> >> >>
>> >> >> It should be better to name it cpu_crit instead, because that's what
>> >> >> other newer dtsi files already use.
>> >> >
>> >> > Seems to me that if I define separate trips for the three groups of
>> >> > CPU cores as mentioned above this would better stay as soc_crit, as it
>> >> > applies to the whole die rather than the CPU cluster alone. Then
>> >> > 'threshold' and 'target' will go altogether, and I'll have separate
>> >> > *_alert0 and *_alert1 per CPU group.
>> >>
>> >> It should perhaps be the best to have "passive", "hot" and "critical"
>> >> trips defined for all three CPU groups/(sub)clusters, separately of
>> >> course, to have even higher granularity when it comes to the resulting
>> >> thermal throttling.
>> >
>> > I looked through drivers/thermal/rockchip_thermal.c, and it doesn't
>> > seem to provide any callback for the "hot" trip as part of its struct
>> > thermal_zone_device_ops, so I guess it would be redundant in our case
>> > here? I couldn't find any generic mechanism to react to "hot" trips,
>> > and they seem to be purely driver-specific, thus no-op in case of
>> > Rockchips - or am I missing something?
>> 
>> That's a good question.  Please, let me go through the code in detail,
>> and I'll get back with an update soon.  Also, please wait a bit with
>> sending the v3, until all open questions are addressed.
> 
> Of course. Thank you for taking the time to dig through this one with 
> me!

I'm glad to help.  It's important to have working thermal throttling on
the supported RK3588-based boards.



More information about the linux-arm-kernel mailing list