[PATCH 1/3] perf pmu: Limit PMU cpumask to online CPUs

Ian Rogers irogers at google.com
Thu Jun 6 17:36:01 PDT 2024


On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung at kernel.org> wrote:
>
> Hello,
>
> On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong at huawei.com> wrote:
> > >
> > > From: Yicong Yang <yangyicong at hisilicon.com>
> > >
> > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > attributes if provided by the driver without checking the CPUs
> > > are online or not. In such case that CPUs provided by the driver
> > > contains the offline CPUs, we'll try to open event on the offline
> > > CPUs and then rejected by the kernel:
> > >
> > > [root at localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > [root at localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > Error:
> > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > /bin/dmesg | grep -i perf may provide additional information.
> > >
> > > So it's better to do a double check in the userspace and only include
> > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > offline CPUs.
> >
> > I see where you are coming from with this but I think it is wrong. The
> > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > the set of valid CPUs. For example:
> > ```
> > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > 0
> > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > Using CPUID GenuineIntel-6-8D-1
> > Attempt to add: uncore_imc_free_running_0/data_read/
> > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off cloexec flag
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest, exclude_host
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             24 (uncore_imc_free_running_0)
> >   size                             136
> >   config                           0x20ff (data_read)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0 = 3
> > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> >
> >  Performance counter stats for 'system wide':
> >
> >             244.51 MiB  uncore_imc_free_running_0/data_read/
> >
> >        0.102320376 seconds time elapsed
> > ```
> > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > contained an offline CPU then this change would make it so the CPU map
> > in the tool were empty, however, a different CPU may be programmable
> > and online.
>
> I think Intel uncore PMU driver ignores the CPU parameter and set it to
> CPU 0 in this case internally.  See uncore_pmu_event_init() at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761

Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
2 socket skylake:
```
$ cat /sys/devices/uncore_imc_0/cpumask
0,18
$ echo 0 > /sys/devices/system/cpu/cpu18/online
$ cat /sys/devices/uncore_imc_0/cpumask
0,19
```
So the cpumask should be reflecting the online/offline nature of CPUs.

> >
> > Fwiw, the tool will determine whether the mask is for all valid or a
> > hint by using the notion of a PMU being "core" or not. That notion
> > considers whether the mask was loading from a "cpumask" or "cpus"
> > file:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> >
> > Thanks,
> > Ian
> >
> > > Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
> > > ---
> > >  tools/perf/util/pmu.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > index 888ce9912275..51e8d10ee28b 100644
> > > --- a/tools/perf/util/pmu.c
> > > +++ b/tools/perf/util/pmu.c
> > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > >                         continue;
> > >                 cpus = perf_cpu_map__read(file);
> > >                 fclose(file);
> > > -               if (cpus)
> > > -                       return cpus;
> > > +               if (cpus) {
> > > +                       struct perf_cpu_map *intersect __maybe_unused;
> > > +
> > > +                       if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > +                               return cpus;
> > > +
> > > +                       intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
>
> So IIUC this is for core PMUs with "cpus" file, right?  I guess uncore
> drivers already handles "cpumask" properly..

So I think this is an ARM specific bug:

Core PMUs:
x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
cpumask and so we default to opening events on all online processors.
The fact these are core PMUs is hardcoded in the tool:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
updated to reflect online and offline
ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
we expect the cpus to contain online CPUs, but it currently also
erroneously contains offline ones.

Uncore PMUs:
x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
online and offline CPUs
ARM has things like the dmc620 PMU, it appears to be missing cpumask
in certain cases leading to the perf tool treating it like the x86
core PMU and opening events for it on every CPU:
```
# ls /sys/devices/arm_dmc620_10008c000/
events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent
```

I think we need a PMU test so that bugs like this can be reported, but
we may also need to add tool workarounds for PMUs that are broken. I
can imagine it will be tricky to test uncore PMUs that default to CPU0
as often we can't offline that.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > > +                       perf_cpu_map__put(cpus);
> > > +                       if (intersect)
> > > +                               return intersect;
> > > +               }
> > >         }
> > >
> > >         /* Nothing found, for core PMUs assume this means all CPUs. */
> > > --
> > > 2.24.0
> > >



More information about the linux-arm-kernel mailing list