[PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Wed Mar 13 09:44:53 PDT 2024

Hello Sebastian,

On 2024-03-13 17:39, Sebastian Reichel wrote:
> On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
>> On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
>> > On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark at gmail.com> wrote:
>> > > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>> > > <sebastian.reichel at collabora.com> wrote:
>> > > > I'm too busy to have a detailed review of this series right now, but
>> > > > I pushed it to our CI and it results in a board reset at boot time:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>> > > >
>> > > > I also pushed just the first three patches (i.e. without OPP /
>> > > > cpufreq) and that boots fine:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>> > >
>> > > Thank you for testing these! I've noticed in the boot log that the CI
>> > > machine uses some u-boot 2023.07 - is that a downstream one? Any
>> > > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>> > > integration tree?
>> > >
>> > > I use 2023.11 from your integration tree, with a binary bl31, and I'm
>> > > not getting those resets even under prolonged heavy load (I rebuild
>> > > Chromium with 8 concurrent compilation jobs as the stress test -
>> > > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
>> > > to understand if it's just a 'lucky' SoC specimen on my side, or if
>> > > there is some dark magic happening differently on my machine vs. your
>> > > CI machine.
>> > >
>> > > Thinking that maybe if your CI machine uses a downstream u-boot it
>> > > might be leaving some extra hardware running (PVTM?) which might do
>> > > weird stuff when TSADC/clocks/voltages get readjusted by the generic
>> > > cpufreq driver?..
>> > >
>> > > > Note, that OPP / cpufreq works on the same boards in the CI when
>> > > > using the ugly-and-not-for-upstream cpufreq driver:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>> > > >
>> > > > My best guess right now is, that this is related to the generic
>> > > > driver obviously not updating the GRF read margin registers.
>> > >
>> > > If it was about memory read margins I believe I would have been
>> > > unlikely to get my machine to work reliably under heavy load with the
>> > > default ones, but who knows...
>> >
>> > Sebastian's report led me to investigate further how all those things
>> > are organized in the downstream code and in hardware, and what could
>> > be a pragmatic way forward with upstream enablement. It turned out to
>> > be quite a rabbit hole frankly, with multiple layers of abstraction
>> > and intertwined code in different places.
>> >
>> > Here's a quick summary for future reference:
>> >  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
>> > which provides an SCMI service to expose them to the kernel
>> >  - ATF itself doesn't directly set any clock frequencies. Instead, it
>> > accepts a target frequency via SCMI and converts it into an oscillator
>> > ring length setting for the PVPLL hardware block (via a fixed table
>> > lookup). At least that's how it's done in the recently released TF-A
>> > bl31 code [1] - perhaps the binary bl31 does something similar
>> >  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>> >  - PVPLL produces a reference clock to feed to the CPUs, which depends
>> > on the configured oscillator ring length but also on the supply
>> > voltage, silicon quality and perhaps temperature too. ATF doesn't know
>> > anything about voltages or temperatures, so it doesn't guarantee that
>> > the requested frequency is matched by the hardware
>> >  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
>> > which case the target frequency is directly fed by the ATF to the CRU.
>> > This happens for both big-core and little-core frequencies below 816
>> > MHz
>> >  - Given that requesting a particular frequency via SCMI doesn't
>> > guarantee that it will be what the CPUs end up running at, the vendor
>> > kernel also does a runtime voltage calibration for the supply
>> > regulators, by adjusting the supply voltage in minimum regulator steps
>> > until the frequency reported by PVPLL gets close to the requested one
>> > [2]. It then overwrites OPP provided voltage values with the
>> > calibrated ones
>> >  - There's also some trickery with preselecting OPP voltage sets using
>> > the "-Lx" suffix based on silicon quality, as measured by a "leakage"
>> > value stored in an NVMEM cell and/or the PVTM frequency generated at a
>> > reference "midpoint" OPP [3]. Better performing silicon gets to run at
>> > lower default supply voltages, thus saving power
>> >  - Once the OPPs are selected and calibrated, the only remaining
>> > trickery is the two supply regulators per each CPU cluster (one for
>> > the CPUs and the other for the memory interface)
>> >  - Another catch, as Sebastian points out, is that memory read margins
>> > must be adjusted whenever the memory interface supply voltage crosses
>> > certain thresholds [4]. This has little to do with CPUs or
>> > frequencies, and is only tangentially related to them due to the
>> > dependency chain between the target CPU frequency -> required CPU
>> > supply voltage -> matching memory interface supply voltage -> required
>> > read margins
>> >  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
>> > setting it to anything in kernel code (as the downstream driver does)
>> > seems redundant
>> >
>> > All in all, it does indeed sound like Collabora's CI machine boot-time
>> > resets are most likely caused by the missing memory read margin
>> > settings in my patch series. Voltage values in the OPPs I used are the
>> > most conservative defaults of what the downstream DT has, and PVPLL
>> > should be able to generate reasonable clock speeds with those (albeit
>> > likely suboptimal, due to them not being tuned to the particular
>> > silicon specimen). And there is little else to differ frankly.
>> >
>> > As for the way forward, it would be great to know the opinions from
>> > the list. My thinking is as follows:
>> >  - I can introduce memory read margin updates as the first priority,
>> > leaving voltage calibration and/or OPP preselection for later (as
>> > those should not affect system stability at current default values,
>> > perhaps only power efficiency to a certain extent)
>> >  - CPUfreq doesn't sound like the right place for those, given that
>> > they have little to do with either CPU or freq :)
>> >  - I suggest a custom regulator config helper to plug into the OPP
>> > layer, as is done for TI OMAP5 [6]. At first, it might be only used
>> > for looking up and setting the correct memory read margin value
>> > whenever the cluster supply voltage changes, and later the same code
>> > can be extended to do voltage calibration. In fact, OMAP code is there
>> > for a very similar purpose, but in their case optimized voltages are
>> > pre-programmed in efuses and don't require runtime recalibration
>> >  - Given that all OPPs in the downstream kernel list identical
>> > voltages for the memory supply as for the CPU supply, I don't think it
>> > makes much sense to customize the cpufreq driver per se.
>> > Single-regulator approach with the generic cpufreq-dt and regulator
>> > coupling sounds much less invasive and thus lower-maintenance
>> 
>> Sorry for my late response.
>> 
>> When doing some more tests I noticed, that the CI never build the
>> custom driver and thus never did any CPU frequency scaling at all.
>> I only used it for my own tests (on RK3588 EVB1). When enabling the
>> custom driver, the CI has the same issues as your series. So my
>> message was completely wrong, sorry about that.
>> 
>> Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
>> the last part is the git hash. This is the exact U-Boot source tree
>> being used:
>> 
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/
>> 
>> This was one of the first U-Boot trees with Rock 5B Ethernet support
>> and is currently flashed to the SPI flash memory of the CI boards.
>> The vendor U-Boot tree is a lot older. Also it is still using the
>> Rockchip binary BL31. We have plans to also CI boot test U-Boot,
>> but currently nobody has time to work on this. I don't think there 
>> should
>> be any relevant changes between upstream 2023.07 and 2023.11 that 
>> could
>> explain this. But it's the best lead now, so I will try to find some 
>> time
>> for doing further tests related to this in the next days.
>> 
>> Regarding the voltage calibration - One option would be to do this
>> calibration at boot time (i.e. in U-Boot) and update the voltages
>> in DT accordingly.
> 
> After some more debugging I finally found the root cause. The CI
> boards were powered from a USB hub using a USB-A to USB-C cable, so
> that the team could access maskrom mode. Since I was not involved in
> setting them up, I was not aware of that. It effectively limits the
> power draw to 500 or 900 mA (depending on USB port implementation),
> which is not enough to power the board with the higher frequencies.
> The KernelCI Rock 5B boards are now switched to proper power
> supplies and the issues are gone.
> 
> Sorry for the false alarm,

Great to know, thanks for the clarification.