[PATCH v2 1/2] drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
liuqi (BA)
liuqi115 at huawei.com
Tue Apr 13 10:12:16 BST 2021
Hi John,
On 2021/4/13 1:21, John Garry wrote:
> On 12/04/2021 14:34, liuqi (BA) wrote:
>>
>> Hi John,
>>
>> Thanks for reviewing this.
>> On 2021/4/9 18:22, John Garry wrote:
>>> On 09/04/2021 10:05, Qi Liu wrote:
>>>> PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported
>>>> to sample bandwidth, latency, buffer occupation etc.
>>>>
>>>> Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is
>>>> registered as a PMU in /sys/bus/event_source/devices, so users can
>>>> select target PMU, and use filter to do further sets.
>
> side note: it would be good to mention what baseline the series is based
> on in the cover letter
>
Got it, will add it, thanks.
>>>>
>>>> Filtering options contains:
>>>> event - select the event.
>>>> subevent - select the subevent.
>>>> port - select target Root Ports. Information of Root Ports
>>>> are shown under sysfs.
>>>> bdf - select requester_id of target EP device.
>>>> trig_len - set trigger condition for starting event statistics.
>>>> trigger_mode - set trigger mode. 0 means starting to statistic when
>>>> bigger than trigger condition, and 1 means smaller.
>>>> thr_len - set threshold for statistics.
>>>> thr_mode - set threshold mode. 0 means count when bigger than
>>>> threshold, and 1 means smaller.
>>>>
>>>> Signed-off-by: Qi Liu <liuqi115 at huawei.com>
>>>> ---
>>>> MAINTAINERS | 6 +
>>>> drivers/perf/Kconfig | 2 +
>>>> drivers/perf/Makefile | 1 +
>>>> drivers/perf/pci/Kconfig | 16 +
>>>> drivers/perf/pci/Makefile | 2 +
>>>> drivers/perf/pci/hisilicon/Makefile | 5 +
>>>> drivers/perf/pci/hisilicon/hisi_pcie_pmu.c | 1011
>>>> ++++++++++++++++++++++++++++
>>>> include/linux/cpuhotplug.h | 1 +
>>>> 8 files changed, 1044 insertions(+)
>>>> create mode 100644 drivers/perf/pci/Kconfig
>>>> create mode 100644 drivers/perf/pci/Makefile
>>>> create mode 100644 drivers/perf/pci/hisilicon/Makefile
>>>> create mode 100644 drivers/perf/pci/hisilicon/hisi_pcie_pmu.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 3353de0..46c7861 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -8023,6 +8023,12 @@ W: http://www.hisilicon.com
>>>> F: Documentation/admin-guide/perf/hisi-pmu.rst
>>>> F: drivers/perf/hisilicon
>>>> +HISILICON PCIE PMU DRIVER
>>>> +M: Qi Liu <liuqi115 at huawei.com>
>>>> +S: Maintained
>>>> +F: Documentation/admin-guide/perf/hisi-pcie-pmu.rst
>>>
>>> nit: this does not exist yet...
>>>
>> thanks, I'll move this add-maintainer-part to the second patch.
>
> that's why I advocate the documentation first :)
ok, I'll move document as the first patch.
>
>>>> +F: drivers/perf/pci/hisilicon/hisi_pcie_pmu.c
>>>> +
>>>> HISILICON QM AND ZIP Controller DRIVER
>>>> M: Zhou Wang <wangzhou1 at hisilicon.com>
>>>> L: linux-crypto at vger.kernel.org
>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>> index 3075cf1..99d4760 100644
>>>> --- a/drivers/perf/Kconfig
>>>> +++ b/drivers/perf/Kconfig
>>>> @@ -139,4 +139,6 @@ config ARM_DMC620_PMU
>>>> source "drivers/perf/hisilicon/Kconfig"
>>>> +source "drivers/perf/pci/Kconfig"
>>>> +
>>>> endmenu
>>>> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
>>>> index 5260b11..1208c08 100644
>>>> --- a/drivers/perf/Makefile
>>>> +++ b/drivers/perf/Makefile
>>>> @@ -14,3 +14,4 @@ obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
>>>> obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
>>>> obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
>>>> obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
>>>> +obj-y += pci/
>>>> diff --git a/drivers/perf/pci/Kconfig b/drivers/perf/pci/Kconfig
>>>> new file mode 100644
>>>> index 0000000..a119486
>>>> --- /dev/null
>>>> +++ b/drivers/perf/pci/Kconfig
>>>> @@ -0,0 +1,16 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +#
>>>> +# PCIe Performance Monitor Drivers
>>>> +#
>>>> +menu "PCIe Performance Monitor"
>>>> +
>>>> +config HISI_PCIE_PMU
>>>> + tristate "HiSilicon PCIE PERF PMU"
>>>> + depends on ARM64 && PCI && HISI_PMU
>>>
>>> What from HISI_PMU is needed? I couldn't find anything here
>> The event_sysfs_show() and format_sysfs_show() function of
>> hisi_uncore_pmu.h can be reused in hisi_pcie_pmu.c, So I add path in
>> Makefile and include "hisi_uncore_pmu.h".
>
> Right, but it would be nice to be able to build this under COMPILE_TEST.
> CONFIG_HISI_PMU cannot be built under COMPILE_TEST, so nice to not
> depend on it.
>
> So you could put hisi_event_sysfs_show() as a static inline in
> hisi_uncore_pmu.h, so the dependency can be removed
>
> Having said that, there is nothing really hisi specific in those
> functions like hisi_event_sysfs_show().
>
> Can't we just create generic functions here?
>
> hisi_event_sysfs_show() == cci400_pmu_cycle_event_show() ==
> xgene_pmu_event_show()
>
Got it, will address this.
>>
>>>
>>>> + help
>>>> + Provide support for HiSilicon PCIe performance monitoring unit
>>>> (PMU)
>>>> + RCiEP devices.
>>>> + Adds the PCIe PMU into perf events system for monitoring
>>>> latency,
>>>> + bandwidth etc.
>>>> +
>
>
>
>
>
[...]
>>>> +static bool hisi_pcie_pmu_valid_filter(struct perf_event *event,
>>>> + struct hisi_pcie_pmu *pcie_pmu)
>>>> +{
>>>> + u32 subev_idx = hisi_pcie_get_subevent(event);
>>>> + u32 event_idx = hisi_pcie_get_event(event);
>>>> + u32 requester_id = hisi_pcie_get_bdf(event);
>>>> +
>>>> + if (subev_idx > HISI_PCIE_SUBEVENT_MAX ||
>>>> + event_idx > HISI_PCIE_EVENT_MAX) {
>>>> + pci_err(pcie_pmu->pdev,
>>>> + "Max event index and max subevent index is: %d, %d.\n",
>>>> + HISI_PCIE_EVENT_MAX, HISI_PCIE_SUBEVENT_MAX);
>>>
>>> if this is just going to be fed back to userspace, I don't see why we
>>> need a kernel log
>>>
>>> and the only caller also triggers an error message, which I doubt is
>>> needed
>>>
>> Print out the HISI_PCIE_EVENT_MAX and HISI_PCIE_SUBEVENT_MAX here may be
>> more convenient for users to get the right value.
>> If you this is redundant I'll remove it. :)
>
> Don't we already tell this to userspace?
>
"event" is a 8-bit filter, its max value is 0xff, but PCIE PMU only have
0xa2 events, So if users input "event=0xa3", userspace only printout
"<not supported>".
Perhaps driver could tell users the max value of event index here.
If you think this is redundant I'll remove it. :)
>>>> + return false;
>>>> + }
>>>> +
>>>> + if (hisi_pcie_get_thr_len(event) > HISI_PCIE_THR_MAX_VAL)
>>>> + return false;
>>>> +
>>>> + if (hisi_pcie_get_trig_len(event) > HISI_PCIE_TRIG_MAX_VAL)
>>>> + return false;
>>>> +
>>>> + if (requester_id) {
>>>> + if (!hisi_pcie_pmu_valid_requester_id(pcie_pmu,
>>>> requester_id)) {
>>>> + pci_err(pcie_pmu->pdev, "Invalid requester id.\n");
>>>
>>> see previous comments
>>>
>>>> + return false;
>>>> + }
>>>> + }
>>>> +
>>>> + return true;
>>>> +}
>>>> +
>>>> +static bool hisi_pcie_pmu_validate_event_group(struct perf_event
>>>> *event)
>>>> +{
>>>> + struct perf_event *sibling, *leader = event->group_leader;
>>>> + int counters = 1;
>>>> +
>>>> + if (!is_software_event(leader)) {
>>>> + if (leader->pmu != event->pmu)
>>>> + return false;
>>>> +
>>>> + if (leader != event)
>>>> + counters++;
>>>> + }
>>>> +
>>>> + for_each_sibling_event(sibling, event->group_leader) {
>>>> + if (is_software_event(sibling))
>>>> + continue;
>>>> +
>>>> + if (sibling->pmu != event->pmu)
>>>> + return false;
>>>> +
>>>> + counters++;
>>>> + }
>>>> +
>>>> + return counters <= HISI_PCIE_MAX_COUNTERS;
>>>> +}
>>>> +
>>>> +static int hisi_pcie_pmu_event_init(struct perf_event *event)
>>>> +{
>>>> + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
>>>> +
>>>> + event->cpu = cpumask_first(&pcie_pmu->cpumask);
>>>> +
>>>> + if (event->attr.type != event->pmu->type)
>>>> + return -ENOENT;
>>>> +
>>>> + /* Sampling is not supported. */
>>>> + if (is_sampling_event(event) || event->attach_state &
>>>> PERF_ATTACH_TASK)
>>>> + return -EOPNOTSUPP;
>>>> +
>>>> + /* Per-task mode is not supported. */
>>>> + if (event->cpu < 0)
>>>
>>> cpumask_first() gives an unsigned int - this can never happen
>
> please fix this!
>
Sorry, missed it yesterday. I'll fix this next version.
>>>
>>>> + return -EINVAL;
>>>> +
>>>> + if (!hisi_pcie_pmu_valid_filter(event, pcie_pmu)) {
>>>> + pci_err(pcie_pmu->pdev, "Invalid filter!\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + if (!hisi_pcie_pmu_validate_event_group(event))
>>>> + return -EINVAL;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static u64 hisi_pcie_pmu_process_data(struct perf_event *event, u64
>>>> val,
>
>
...
>
>>>> +
>>>> +static void hisi_pcie_pmu_irq_unregister(struct pci_dev *pdev,
>>>> + struct hisi_pcie_pmu *pcie_pmu)
>>>> +{
>>>> + free_irq(pcie_pmu->irq, pcie_pmu);
>>>> + pci_free_irq_vectors(pdev);
>>>> +}
>>>> +
>>>> +static int hisi_pcie_pmu_online_cpu(unsigned int cpu, struct
>>>> hlist_node *node)
>>>> +{
>>>> + struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node,
>>>> + struct hisi_pcie_pmu, node);
>>>> +
>>>> + if (cpumask_empty(&pcie_pmu->cpumask)) {
>>>> + cpumask_set_cpu(cpu, &pcie_pmu->cpumask);
>>>> + WARN_ON(irq_set_affinity_hint(pcie_pmu->irq,
>>>> cpumask_of(cpu)));
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct
>>>> hlist_node *node)
>>>> +{
>>>> + struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node,
>>>> + struct hisi_pcie_pmu, node);
>>>> + unsigned int target;
>>>> +
>>>> + if (!cpumask_test_and_clear_cpu(cpu, &pcie_pmu->cpumask))
>>>
>>> I do wonder why we even need maintain pcie_pmu->cpumask
>>>
>>> Can't we just use cpu_online_mask as appropiate instead?
>
> ?
Sorry, missed it yesterday.
It seems that cpumask is always same as cpu_online_mask, So do we need
to reserve the cpumask sysfs interface?
Thanks,
Qi
>
>>>
>>>> + return 0;
>>>> +
>>>> + /* Choose a new CPU from all online cpus. */
>>>> + target = cpumask_any_but(cpu_online_mask, cpu);
>>>> + if (target >= nr_cpu_ids)
>>>> + return 0;
>>>> +
>>>> + perf_pmu_migrate_context(&pcie_pmu->pmu, cpu, target);
>>>> + WARN_ON(irq_set_affinity_hint(pcie_pmu->irq, cpumask_of(target)));
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>
> .
More information about the linux-arm-kernel
mailing list