[PATCH v2] perf/hx_arm_ni: Support uncore ARM NI-700 PMU

Yang Jialong 杨佳龙 jialong.yang at shingroup.cn
Wed Jan 31 02:07:03 PST 2024



在 2024/1/31 17:38, Krzysztof Kozlowski 写道:
> On 31/01/2024 10:07, Yang Jialong 杨佳龙 wrote:
>>
>>
>> 在 2024/1/31 15:59, Krzysztof Kozlowski 写道:
>>> On 31/01/2024 08:08, JiaLong.Yang wrote:
>>>> This code is based on uncore PMUs arm_smmuv3_pmu and arm-cmn.
>>>> One ni-700 can have many clock domains. Each of them has only one PMU.
>>>> Here one PMU corresponds to one 'struct ni_pmu' instance.
>>>> PMU name will be ni_pmu_N_M, which N means different NI-700s and M means
>>>> different PMU in one NI-700. If only one NI-700 found in NI-700, name will
>>>> be ni_pmu_N.
>>>> Node interface event name will be xxni_N_eventname, such as
>>>> asni_0_rdreq_any. There are many kinds of type of nodes in one clock
>>>> domain. Also means that there are many kinds of that in one PMU. So we
>>>> distinguish them by xxni string. Besides, maybe there are many nodes
>>>> have same type. So we have number N in event name.
>>>> By ni_pmu_0_0/asni_0_rdreq_any/, we can pinpoint accurate bus traffic.
>>>> Example1: perf stat -a -e ni_pmu_0_0/asni_0_rdreq_any/,ni_pmu_0_0/cycles/
>>>> EXample2: perf stat -a -e ni_pmu_0_0/asni,id=0,event=0x0/
>>>>
>>>> Signed-off-by: JiaLong.Yang <jialong.yang at shingroup.cn>
>>>> ---
>>>> v1 --> v2:
>>>> 1. Submit MAINTANER Documentation/ files seperately.
>>>
>>> SEPARATE PATCHES, not patchsets. You have now checkpatch warnings
>>> because of this...
>>
>> ...OK. But the MAINTANER file changing should be given in which one
>> patches.
>> I will submit patch v3 after talking and your permission.
>>
>>>
>>>> 2. Delete some useless info printing.
>>>> 3. Change print from pr_xxx to dev_xxx.
>>>> 4. Fix more than 75 length log info.
>>>> 5. Fix dts attribute pccs-id.
>>>> 6. Fix generic name according to DT specification.
>>>> 7. Some indentation.
>>>> 8. Del of_match_ptr macro.
>>>>
>>>>    drivers/perf/Kconfig     |   11 +
>>>>    drivers/perf/Makefile    |    1 +
>>>>    drivers/perf/hx_arm_ni.c | 1284 ++++++++++++++++++++++++++++++++++++++
>>>>    3 files changed, 1296 insertions(+)
>>>>    create mode 100644 drivers/perf/hx_arm_ni.c
>>>>
>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>> index ec6e0d9194a1..95ef8b13730f 100644
>>>> --- a/drivers/perf/Kconfig
>>>> +++ b/drivers/perf/Kconfig
>>>> @@ -241,4 +241,15 @@ config CXL_PMU
>>>>    
>>>>    	  If unsure say 'm'.
>>>>    
>>>> +config HX_ARM_NI_PMU
>>>> +       tristate "HX ARM NI-700 PMU"
>>>> +       depends on PPC_HX_C2000 && 64BIT
>>>
>>> 1. There is no PPC_HX_C2000.
>>
>> I have been used to using this macro. However this macro is not existed
>> in mainline.
>> I will replace it with ARM64. And del involved C code if OK.
>>
>> 64bit:
>> __ffs(unsigned long) and __fls(unsigned long) will be wrong in 32bit. I
>> pass a u64 argument.
> 
> One thing is where the code is supposed to run, second thing is compile
> testing.
> 

Now run on my company product, a 64bit PowerPC...
But I think it's general for 64bit systems.

> Why do you use __ffs, not __ffs64 which takes u64 if you really want
> only 64bit argument? unsigned long != u64, so your code is not
> architecture independent. You claim you wrote it on purpose as
> non-architecture-independent, but then I claim it's a bug. We are
> supposed to write code which is portable, as much as possible, assuming
> it does not affect readability.
> 

I write code in v5.18, there are __ffs64() and fls64(). Asymmetric.
There are some difference in return val between __ffs() and ffs64().
__ffs(0) and ffs64(0) will give different value.

And I'm sure code run in 64bit. So I choose to use __ffs and __fls.

Maybe it could be compatbile with 32bit. But I don't have a environment 
to test this.
> 
>> struct ni_hw_perf_event will be big than limit.
>> BUILD_BUG_ON(sizeof(struct ni_hw_perf_event) > offsetof(struct
>> hw_perf_event, target));
> 
> And why do you need to use any of such code? Please open one of hundreds
> of other drivers which work correctly on 32 and 64-bit platforms.
> 

Code for 64bit.
This code is to avoid struct ni_hw_perf_event is too big than struct 
hw_perf_event::target.
I learn it from arm-cmn.c.
ni_hw_perf_event will replace hw_perf_event.
I will put some useful information in it with less space and good field 
names.
But I can't exceed a limit.

>>
>>> 2. Nothing justified dependency on 64bit. Drop or explain. Your previous
>>> message did not provide real rationale.
>>
>> If ARM64, then drop.
> 
> ...
> 
> ...
> 
>>>> +	/* Step2: Traverse all clock domains. */
>>>> +	for (cd_idx = 0; cd_idx < ni->cd_num; cd_idx++) {
>>>> +		cd = cd_arrays[cd_idx];
>>>> +
>>>> +		num = ni_child_number(cd);
>>>> +		dev_dbg(dev, "The %dth clock domain has %d child nodes:", cd_idx, num);
>>>> +
>>>> +		/* Omit pmu node */
>>>> +		ni_pmu = devm_kzalloc(dev, struct_size(ni_pmu, ev_src_nodes, num - 1),
>>>> +				      GFP_KERNEL);
>>>> +		ni_pmu->ev_src_num = num - 1;
>>>> +
>>>> +		if (!ni_pmu)
>>>> +			return -ENOMEM;
>>>> +
>>>> +		num_idx = 0;
>>>> +		for (nd_idx = 0; nd_idx < num; nd_idx++) {
>>>> +			nd = ni_child_pointer(pbase, cd, nd_idx);
>>>> +
>>>> +			node.base = nd;
>>>> +			node.node_type = ni_node_node_type(nd);
>>>> +
>>>> +			if (unlikely(ni_node_type(nd) == NI_PMU))
>>>> +				ni_pmu->pmu_node = node;
>>>> +			else
>>>> +				ni_pmu->ev_src_nodes[num_idx++] = node;
>>>> +			dev_dbg(dev, "  name: %s   id: %d", ni_node_name[node.type], node.id);
>>>> +		}
>>>> +
>>>> +		ni_pmu->dev = dev;
>>>> +		ni_pmu->ni = ni;
>>>> +		ni->ni_pmus[cd_idx] = ni_pmu;
>>>> +	}
>>>> +
>>>> +	devm_kfree(dev, cd_arrays);
>>>
>>> Why? If it is not device-lifetime then allocate with usual way.
>>>
>>
>> No device-lifetime.
>> Will allocate in stack.
> 
> I was thinking about kzalloc. But if array is small, stack could be as well.
> 

If I have to return before devm_kfree because of wrong, I will have to use:

goto out;

out:
kfree();

But if I use devm_kzalloc, I will not be worried about that. Even if no 
device-lifetime.
Isn't this a good way?

> ...
> 
>>>
>>>> +
>>>> +static const struct of_device_id ni_pmu_of_match[] = {
>>>> +	{ .compatible = "hx,c2000-arm-ni" },
>>>
>>> Don't send undocumented compatibles.
>>
>> OK. Means I should send doc and code in one patch thread with more than
>> one patch?
> 
> Yes. Please open lore.kernel.org and look at any other submissions
> involving bindings or other type of ABI documentation (like sysfs).

Get.

> 
> Best regards,
> Krzysztof
> 
> 




More information about the linux-arm-kernel mailing list