[RFC PATCH 0/4] Support for passing runtime state idle time to TF-A

Fri Apr 23 23:24:51 BST 2021

On 4/23/21 1:16 PM, Lukasz Luba wrote:
> Hi Sowjanya,
>
> On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
>> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs 
>> which is
>> in charge of deciding on state transition based on target state, 
>> state idle
>> time, and some other Tegra CPU core cluster states information.
>>
>> Current PSCI specification don't have function defined for passing 
>> runtime
>> state idle time predicted by governor (based on next events and state 
>> target
>> residency) to ARM trusted firmware.
>
> Do you have some numbers from experiments showing that these idle
> governor prediction values, which are passed from kernel to MCE
> firmware, are making a good 'guess'?
> How much precision (1us? 1ms?) in the values do you need there?

it could also be in few ms depending on when next cpu event/activity 
might happen which is not transparent to MCE firmware.

>
> IIRC (probably Rafael's presentations) predicting in the kernel
> something like CPU idle time residency is not a trivial thing.
>
> Another idea (depending on DT structure and PSCI bits):
> Could this be solved differently, but just having a knowledge that if
> the governor requested some C-state, this means governor 'predicted'
> an idle residency to be greater that min_residency attached to this
> C-state?
> Then, when that request shows up in your FW, you know that it must be at
> least min_residency because of this C-state id.
C6 is the only deepest state for Tegra194 Carmel CPU that we support in 
addition to C1 (WFI) idle state.

MCE firmware gets state crossover thresholds for C1 to C6 transition 
from TF-A and uses it along with state idle time to decide on C6 state 
entry based on its background work.

Assuming for now if we use min_residency as state idle time which is 
static value from DT, then it enters into deepest state C6 always as we 
use min_residency value we use is always higher than state crossover 
threshold.

But MCE firmware is not aware of when next cpu event can happen to 
predict if next event can take longer than state min_residency time.

Using min residency in such case is very conservative where MCE firmware 
exits C6 state early where we may not have better power saving.

But with MCE firmware being aware of when next event can happen it can 
use that to stay in C6 state without early exit for better power savings.

> It would depend on number of available states, max_residency, scale
> that you would choose while assigning values from [0, max_residency]
> to each state.
> IIRC there can be many state IDs for idle, so it would depend on
> number of bits encoding this state, and your needs. Example of
> linear scale:
> 4-bits encoding idle state and max predicted residency 10msec,
> that means 10000us / 16 states = 625us/state.
> The max_residency might be split differently, using different than
> linear function, to have some rage more precised.
>
> Open question is if these idle states must be all represented
> in DT, or there is a way of describing a 'set of idle states'
> automatically.
We only support C6 state through DT as C6 is the only deepest state for 
Tegra194 carmel CPU. WFI idle state is completely handled by kernel and 
does not require MCE sequences for entry/exit.
>
> Regards,
> Lukasz