[PATCH RFC 18/27] drivers: cpu-pd: Add PM Domain governor for CPUs

Fri Nov 20 08:42:50 PST 2015

On Fri, Nov 20 2015 at 09:20 -0700, Lorenzo Pieralisi wrote:
>On Thu, Nov 19, 2015 at 03:52:13PM -0800, Kevin Hilman wrote:
>> Lorenzo Pieralisi <lorenzo.pieralisi at arm.com> writes:
>>
>> > On Tue, Nov 17, 2015 at 03:37:42PM -0700, Lina Iyer wrote:
>> >> A PM domain comprising of CPUs may be powered off when all the CPUs in
>> >> the domain are powered down. Powering down a CPU domain is generally a
>> >> expensive operation and therefore the power performance trade offs
>> >> should be considered. The time between the last CPU powering down and
>> >> the first CPU powering up in a domain, is the time available for the
>> >> domain to sleep. Ideally, the sleep time of the domain should fulfill
>> >> the residency requirement of the domains' idle state.
>> >>
>> >> To do this effectively, read the time before the wakeup of the cluster's
>> >> CPUs and ensure that the domain's idle state sleep time guarantees the
>> >> QoS requirements of each of the CPU, the PM QoS CPU_DMA_LATENCY and the
>> >> state's residency.
>> >
>> > To me this information should be part of the CPUidle governor (it is
>> > already there), we should not split the decision into multiple layers.
>> >
>> > The problem you are facing is that the CPUidle governor(s) do not take
>> > cross cpus relationship into account, I do not think that adding another
>> > decision layer in the power domain subsystem helps, you are doing that
>> > just because adding it to the existing CPUidle governor(s) is invasive.
>> >
>> > Why can't we use the power domain work you put together to eg disable
>> > idle states that share multiple cpus and make them "visible" only
>> > when the power domain that encompass them is actually going down ?
>> >
>> > You could use the power domains information to detect states that
>> > are shared between cpus.
>> >
>> > It is just an idea, what I am saying is that having another governor in
>> > the power domain subsytem does not make much sense, you split the
>> > decision in two layers while there is actually one, the existing
>> > CPUidle governor and that's where the decision should be taken.
>>
>> Hmm, considering "normal" devices in "normal" power domains, and
>> following the same logic, the equivalent would be to say that the
>> decision to gate the power domain belongs to the individual drivers
>> in the domain instead of in the power domain layer.  I disagree.
>>
>> IMO, there are different decision layers because there are different
>> hardware layers.  Devices (including CPUs) are reponsible for handling
>> device-local idle states, based on device-local conditions (e.g. local
>> wakeups, timers, etc.)  and domains are responsible for handling
>> decisions based on conditions of the whole domain.
>
>After going through the series for the second time (it is quite complex and
>should probably be split) I understood your point of view and I agree with
>it, I will review it more in-depth to understand the details.
>
I have included patches from Axel and Marc, so as to get a complete
picture. My core changes are in genpd, cpu-pd and psci.c

>One thing that is not clear to me is how we would end up handling
>cluster states in platform coordinated mode with this series (and
>I am actually referring to the data we would add in the idle-states,
>such as min-residency).
>
>From what I see, the platform coordinated mode, doesnt need any of this.
We are fine as it is today. CPUs vote for the cluster state they can
enter and the f/w determines based on these votes. It makes sense and
probably easier to flatten out the cluster states and attach them to
cpuidle for that.

I couldnt find a symmetry with OS initated. May be it deserves more
discussion and brain storming.

>I admit that data for cluster states at present
>is not extremely well defined, because we have to add latencies for
>the cluster state even if the state itself may be just a cpu one (by
>definition a cluster state is entered only if all cpus in the cluster
>enter it, otherwise FW or power controller demote them automatically).
>

>I would like to take this series as an opportunity to improve the
>current situation in a clean way (and without changing the bindings,
>only augmenting them).
>
>On a side note, I think we should give up the concept of cluster
>entirely, to me they are just a group of cpus, I do not see any reason
>why we should group cpus this way and I do not like the dependencies
>of this series on the cpu-map either, I do not see the reason but I
>will go through code again to make sure I am not missing anything.
>
SoC's could have different organization of CPUs (clubbed as clusters)
and power domains the power thesee clusters. This information has to
come from the DT. Since there are no actual devices in linux for domain
management (with PSCI), I have added them to cpu-map, which already
builds up the cluster hierarchy. The only addition I had to make wa
allow these cluster nodes to be tell the kernel that they are domain
providers.

>To be clear, to me the cpumask should be created with all cpus belonging
>in a given power domain, no cluster dependency (and yes the CPU PM
>notifiers are not appropriate at present - eg on
>cpu_cluster_pm_{enter/exit} we save and restore the GIC distributor state
>even on multi-cluster systems, that's useless and has no connection with
>the real power domain topology at all, so the concept of cluster as it
>stands is shaky to say the least).
>

Lets discuss this more. I am interested in what you are thinking, will
let you go through the code.

Thanks for you time Lorenzo.

-- Lina