Kernel perf counter support (for apple M1 and others)

Yichao Yu yyc1992 at gmail.com
Wed Apr 13 05:58:30 PDT 2022


> I am playing with the performance counters on the apple M1 chip from
> linux with the hope that it could help making userspace tools like
> perf and rr works on the M1. However, I was told that none of these
> info should go into the kernel (not even raw event names) and the
> userspace should only use the raw event numbers instead of
> PERF_TYPE_HARDWARE even for events that have a canonical counterpart.
>
> Although I'm not planning to submit any kernel patches anytime soon
> and I'm mostly interested in running the test right now, I do want to
> know what I should expect in the long term on the userspace side. I
> was told to ask about this on "the list" (and I'm hoping this is the
> right one after browsing through MAINTAINERS) instead. There are a few
> issues/questions, not all of which are related to M1/asymmetric
> systems. For context, see
> https://oftc.irclog.whitequark.org/asahi-dev/2022-03-30 (there also
> happens to be no other discussion on the channel that day)
>
> 1. Is it acceptable (to either kernel or perf source) to submit
> patches that are based on a14.plist from macOS. I have personally
> never looked at it but if it is acceptable then there's little point
> doing the experiment I was doing (apart from the fun doing so and as a
> practice to understand the system).
>
> 2. Should the kernel provide names for hardware events? Here I'm
> talking about things under
> `/sys/bus/event_source/devices/<pmu>/events` which I assume is
> provided by the kernel (that or my understanding of sysfs has been
> fundamentally wrong/out-of-date...). Based on the fact that the
> current pmu kernel driver for the M1 does provide this and this
> comment https://github.com/torvalds/linux/blob/e8b767f5e04097aaedcd6e06e2270f9fe5282696/drivers/perf/apple_m1_cpu_pmu.c#L31
> I assume it's desired. This would also agree with what I've observed
> on other (including non-x86) systems. If this is the case, I assume
> the kernel driver for the M1 PMU isn't fully "done" yet.
>
> 3. For counting events on a system with asymmetric cores.
>     I understand that if the system contains multiple processors of
> different characteristics, it may not make sense to provide a counter
> that counts events on both (or all) types of cores. However, there are
> events (PERF_COUNT_HW_INSTRUCTIONS and
> PERF_COUNT_HW_BRANCH_INSTRUCTIONS at the least) that shouldn't really
> be affected by this (and in fact, any counters that counts events
> visible directly to the software/userspace). I want to even say that
> branch misses/cache reference/misses might be in this category as well
> although certainly not as clear cut.
>
> 4. There are other events that may not make as much sense to combine
> (cycles for example). However, I feel like a combined cycle count
> isn't going to be much tricker to use given that the cycle count on a
> single core is still affected by frequency scaling and it can still be
> used correctly by pinning the thread.
>
> The main reasons I'm asking about 3 and 4 is that
> 1. Right now, even to just count instructions without pinning the
> thread, I need to create two counters.
> 2. Even if the number isn't exactly accurate, it can still be useful
> as a general guideline. Right now, even if I just want to do a quick
> check, I still need to manually specify a dozen of events in `perf
> stat -e` rather than simply using `perf stat` (to make it worse, perf
> doesn't even provide any useful warning about it). It is also much
> harder to do things generically (which is at least partially because
> of the lack of documentation....).


Anyone got any input on this? Over at https://rr-project.org/, it
would be really nice if some counters can be handled transparently
when the process migrates between cores.

>
>
> Yichao Yu



More information about the linux-arm-kernel mailing list