[PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus

Thu Apr 28 13:15:42 PDT 2022

On 8/04/22 06:56, Ian Rogers wrote:
> If all_cpus is calculated it represents the merge/union of all
> evsel cpu maps. By default user_requested_cpus is computed to be
> the online CPUs. For uncore events, it is often the case currently
> that all_cpus is a subset of user_requested_cpus. Metrics printed
> without aggregation and with metric-only, in print_no_aggr_metric,
> iterate over user_requested_cpus assuming every CPU has a metric to
> print. For each CPU the prefix is printed, but then if the
> evsel's cpus doesn't contain anything you get an empty line like
> the following on a 2 socket 36 core SkylakeX:
> 
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.000453137 CPU0                       0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137 CPU18                      0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      2.003717143 CPU0                       0.00
> ...
> ```
> 
> While it is possible to be lazier in printing the prefix and
> trailing newline, having user_requested_cpus not be a subset of
> all_cpus is preferential so that wasted work isn't done elsewhere
> user_requested_cpus is used. The change modifies user_requested_cpus
> to be the intersection of user specified CPUs, or default all online
> CPUs, with the CPUs computed through the merge of all evsel cpu maps.
> 
> New behavior:
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.001086325 CPU0                       0.00
>      1.001086325 CPU18                      0.00
>      2.003671291 CPU0                       0.00
>      2.003671291 CPU18                      0.00
> ...
> ```
> 
> Signed-off-by: Ian Rogers <irogers at google.com>
> ---
>  tools/perf/util/evlist.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 52ea004ba01e..196d57b905a0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
>  	if (!cpus)
>  		goto out_delete_threads;
>  
> +	if (evlist->core.all_cpus) {
> +		struct perf_cpu_map *tmp;
> +
> +		tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);

Isn't an uncore PMU represented as being on CPU0 actually
collecting data that can be due to any CPU.

Or for an uncore PMU represented as being on CPU0-CPU4 on a
4 core 8 hyperthread processor, actually 1 PMU per core.

So I am not sure intersection makes sense.

Also it is not obvious what happens with hybrid CPUs or
per thread recording.

> +		perf_cpu_map__put(cpus);
> +		cpus = tmp;
> +	}
>  	evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
>  
>  	perf_evlist__set_maps(&evlist->core, cpus, threads);