[PATCH 0/3] Perf avoid opening events on offline CPUs

Ian Rogers irogers at google.com
Tue Jul 2 21:55:16 PDT 2024


On Mon, Jul 1, 2024 at 7:22 AM Will Deacon <will at kernel.org> wrote:
>
> On Mon, Jun 03, 2024 at 05:28:09PM +0800, Yicong Yang wrote:
> > From: Yicong Yang <yangyicong at hisilicon.com>
> >
> > If user doesn't specify the CPUs, perf will try to open events on CPUs
> > of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> > attributes if provided. But we doesn't check whether the CPUs provided
> > by the PMU are all online. So we may open events on offline CPUs if PMU
> > driver provide offline CPUs and then we'll be rejected by the kernel:
> >
> > [root at localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > [root at localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > Error:
> > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > /bin/dmesg | grep -i perf may provide additional information.
>
> I still don't see the value in this. CPUs can come and go asynchronously,
> so this is all horribly racy. Furthermore, there are other (racy) ways
> to find out which CPUs are online and whatever we do in the kernel now
> isn't going to help userspace running on older kernels.

Hi Will,

you are assuming here that a counter should be opened on all CPUs.
This is true for "core" PMUs for events like "cycles" but isn't true
for uncore PMUs. For an uncore PMU on dual socket x86 it may have a
cpumask advertising opening on CPUs "0,18". If CPU 18 is taken offline
then the cpumask becomes say  "0,19". In the perf tool we don't need
to determine the topology of the system and see that 19 is next to 18
and online, the PMU driver does it for us. If we intersect the cpumask
of an uncore PMU with online CPUs and the cpumask were still "0,18",
then we'd end up only opening the event on one socket (for CPU 0).

What the cpumask is providing is the list of default CPUs we want to
open the event upon, like a "-C" command line option. Yes this is racy
if you are running perf and taking CPUs on and offline, but it's what
happens on x86 and we live with it okay. That's not to say we can't do
a smarter topology system, cover races with things like BPF, and so
on. These things just aren't where the perf tool code is today and
could face challenges wrt permissions, older kernel compatibility and
so on. In the perf tool code today:

 - No cpumask/cpus file was provided on non-hybrid/BIG.little systems
and so such a cpumask is taken to mean open on all online CPUs. If the
cpumask exists but is empty then we also do this. I see broken ARM
memory controller PMUs with empty cpumask files.
 - A cpus file with a list of online CPUs. This is used by core PMUs
on hybrid/BIG.little systems. x86 doesn't place offline CPUs in this
file but ARM does. My hope is that ARM can be consistent with x86.
 - A cpumask with a list of say one CPU per socket. This is generally
used by uncore PMUs, CPUs not in the cpumask are still valid and are
sometimes used to spread interrupt load. The majority of ARM PMU
drivers seem broken here, either wrt offline CPUs, not providing a
cpumask, etc.

Thanks,
Ian



More information about the linux-arm-kernel mailing list