[PATCH v2 0/7] PMU performance improvements

Ian Rogers irogers at google.com
Thu Oct 12 10:56:38 PDT 2023


Performance improvements to pmu scanning by holding onto the
event/metric tables for a cpuid (avoid regular expression comparisons)
and by lazily computing the default perf_event_attr for a PMU.

Before
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 251.990 usec (+- 4.009 usec)
  Average PMU scanning took: 3222.460 usec (+- 211.234 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 260.120 usec (+- 7.905 usec)
  Average PMU scanning took: 3228.995 usec (+- 211.196 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 252.310 usec (+- 3.980 usec)
  Average PMU scanning took: 3220.675 usec (+- 210.844 usec)

After:
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 28.530 usec (+- 0.602 usec)
  Average PMU scanning took: 275.725 usec (+- 18.253 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 28.720 usec (+- 0.446 usec)
  Average PMU scanning took: 271.015 usec (+- 18.762 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 31.040 usec (+- 0.612 usec)
  Average PMU scanning took: 267.340 usec (+- 17.209 usec)

Measuring the pmu-scan benchmark on a Tigerlake laptop: core PMU
scanning is reduced to 11.5% of the previous execution time, all PMU
scanning is reduced to 8.4% of the previous execution time. There is a
4.3% reduction in openat system calls.

v2. Address feedback from Adrian Hunter and Yang Jihong to allow the
    caching to address varying CPUIDs per PMU (currently an ARM64 only
    feature) and to cache when there is no table to return.

Ian Rogers (7):
  perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
  perf intel-pt: Move PMU initialization from default config code
  perf arm-spe: Move PMU initialization from default config code
  perf pmu: Const-ify file APIs
  perf pmu: Const-ify perf_pmu__config_terms
  perf pmu-events: Remember the perf_events_map for a PMU
  perf pmu: Lazily compute default config

 tools/perf/arch/arm/util/cs-etm.c    |  13 +---
 tools/perf/arch/arm/util/pmu.c       |  10 +--
 tools/perf/arch/arm64/util/arm-spe.c |  48 ++++++------
 tools/perf/arch/s390/util/pmu.c      |   3 +-
 tools/perf/arch/x86/util/intel-pt.c  |  27 +++----
 tools/perf/arch/x86/util/pmu.c       |   6 +-
 tools/perf/pmu-events/jevents.py     | 109 +++++++++++++++++----------
 tools/perf/util/arm-spe.h            |   4 +-
 tools/perf/util/cs-etm.h             |   2 +-
 tools/perf/util/intel-pt.h           |   3 +-
 tools/perf/util/parse-events.c       |  12 +--
 tools/perf/util/pmu.c                |  38 +++++-----
 tools/perf/util/pmu.h                |  22 +++---
 tools/perf/util/python.c             |   2 +-
 14 files changed, 160 insertions(+), 139 deletions(-)

-- 
2.42.0.655.g421f12c284-goog




More information about the linux-arm-kernel mailing list