[PATCH v2 8/8] docs: perf: Add new description on HiSilicon uncore PMU v2
zhangshaokun at hisilicon.com
Wed Feb 3 02:51:08 EST 2021
Some news functions are added on HiSilicon uncore PMUs. Document them
to provide guidance on how to use them.
Cc: Mark Rutland <mark.rutland at arm.com>
Cc: Will Deacon <will at kernel.org>
Cc: John Garry <john.garry at huawei.com>
Cc: Jonathan Cameron <Jonathan.Cameron at huawei.com>
Reviewed-by: John Garry <john.garry at huawei.com>
Co-developed-by: Qi Liu <liuqi115 at huawei.com>
Signed-off-by: Qi Liu <liuqi115 at huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun at hisilicon.com>
Documentation/admin-guide/perf/hisi-pmu.rst | 54 +++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 404a5c3d9d00..47aadbcda301 100644
@@ -53,6 +53,60 @@ Example usage of perf::
$# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
+For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
+as PMU v1, but some new functions are added to the hardware.
+(a) L3C PMU supports filtering by core/thread within the cluster which can be
+specified as a bitmap.
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
+This will only count the operations from core/thread 0 and 1 in this cluster.
+(b) Tracetag allow the user to chose to count only read, write or atomic
+operations via the tt_req parameeter in perf. The default value counts all
+operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
+represents write operations, 3'b110 represents atomic store operations and
+3'b111 represents atomic non-store operations, other values are reserved.
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
+This will only count the read operations in this cluster.
+(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
+Some important codes are as follows:
+5'b00001: comes from L3C in this die;
+5'b01000: comes from L3C in the cross-die;
+5'b01001: comes from L3C which is in another socket;
+5'b01110: comes from the local DDR;
+5'b01111: comes from the cross-die DDR;
+5'b10000: comes from cross-socket DDR;
+etc, it is mainly helpful to find that the data source is nearest from the CPU
+cores. If datasrc is used in the multi-chips, the ds_skt shall be configured in
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,ds_cfg=0xE/,
+ hisi_sccl3_l3c0/config=0xb9,ds_cfg=0xF/ sleep 5
+(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die contains
+many Compute Clusters (CCLs). The I/O dies are called Super I/O clusters (SICL)
+containing multiple I/O clusters (ICLs). Each CCL/ICL in the SoC has a
+unique master-ID. The uncore PMU can filter by specified master-ID or
+combination of master-IDs. The master-ID is 14bits of which the lower 3-bits
+specify the individual core within a CCL. The upper 11 bits include a
+6-bit SCCL-ID and 5-bit CCL/ICL-ID.
+The user may filter by a specific CCL/ICL through the mstid_cmd and mstid_msk
+parameters. A set bit in mstid_mask means the PMU will not check the bit when
+matching against the mstid_cmd.
+(e) For new uncore PMU, SLLC and PA, normal PMU events are supported and other
+new functions are also added simultaneously, such as, tgt_id and src_id can
+be determined by the requirements which are also 11-bits including SCCL-ID and
+CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
+If all of these options are disabled, it can works by the default value that
+doesn't distinguish the filter condition and ID information and will return
+the total counter values in the PMU counters.
The current driver does not support sampling. So "perf record" is unsupported.
Also attach to a task is unsupported as the events are all uncore.
More information about the linux-arm-kernel