[PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
liwei (GF)
liwei391 at huawei.com
Thu Sep 24 10:14:17 EDT 2020
Hi Andi,
On 2020/9/23 3:50, Andi Kleen wrote:
> On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
>>> After debugging, i found the root reason is that the xyarray fd is created
>>> by evsel__open_per_thread() ignoring the cpu passed in
>>> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
>>> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
>>> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
>>> perf_evsel__close_fd_cpu().
>>>
>>> To address this, add a flag to mark this situation and avoid using the
>>> affinity technique when closing/enabling/disabling events.
>>
>> The flag seems like a hack. How about figuring out the correct number of
>> CPUs and using that?
>
> Also would like to understand what's different on ARM64 than other architectures.
> Or could this happen on x86 too?
>
The problem is that when the user requests per-task events, the cpumask is expected
as NULL(dummy), while the armv8_pmu do has a cpumask which inherited by evsel.
The armv8_pmu's cpumask was added for heterogeneous systems. So this issue can not
happen on x86.
In fact, the cpumask is correct indeed, but it should't be used when we requesting
per-task events. As these events should be install on all cores, i doubt how much we
can benefit from the affinity technique, so i choosed to add a flag.
I also did a test on hisilicon arm64 d06 board, with 2 sockets 128 cores.
Testing the following command 3 times, with/without the affinity technique:
time tools/perf/perf stat -ddd -C 0-127 --per-core --timeout=5000 2> /dev/null
* (HEAD detached at 7074674e7338) perf cpumap: Maintain cpumaps ordered and without dups
real 0m8.039s
user 0m0.402s
sys 0m2.582s
real 0m7.939s
user 0m0.360s
sys 0m2.560s
real 0m7.997s
user 0m0.358s
sys 0m2.586s
* (HEAD detached at 704e2f5b700d) perf stat: Use affinity for enabling/disabling events
real 0m7.954s
user 0m0.308s
sys 0m2.590s
real 0m12.959s
user 0m0.332s
sys 0m2.582s
real 0m18.009s
user 0m0.346s
sys 0m2.562s
The offcpu time is much longer when using affinity, i think that's what migration costs,
could you please share me your test case?
Thanks,
Wei
More information about the linux-arm-kernel
mailing list