[PATCH 1/3] perf/arm-cmn: Decouple wp_config registers from filter group number
Ilkka Koskinen
ilkka at os.amperecomputing.com
Mon Jan 29 21:28:42 PST 2024
Hi Robin,
Thanks for the review.
On Mon, 29 Jan 2024, Robin Murphy wrote:
> Hi Ilkka,
>
> On 2024-01-26 10:12 pm, Ilkka Koskinen wrote:
>> Previously, wp_config0/2 registers were used for primary match group and
>> wp_config1/3 registers for secondary match group. In order to support
>> tertiary match group, this patch decouples the registers and the groups.
>
> Happy to see you having a stab at this, however I fear I you're in for a fair
> dose of "if it were this simple I might have already done it" :)
Uh, for a little while I thought it seemed too easy but decided to
continue nevertheless
>
>> Allocation is changed to dynamic but it's still per mesh instance rather
>> than per node.
>>
>> Signed-off-by: Ilkka Koskinen <ilkka at os.amperecomputing.com>
>> ---
>> drivers/perf/arm-cmn.c | 52 ++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
>> index c584165b13ba..93eb47ea7e25 100644
>> --- a/drivers/perf/arm-cmn.c
>> +++ b/drivers/perf/arm-cmn.c
>> @@ -591,6 +591,7 @@ struct arm_cmn_hw_event {
>> u8 dtm_offset;
>> bool wide_sel;
>> enum cmn_filter_select filter_sel;
>> + int wp_idx;
>> };
>> #define for_each_hw_dn(hw, dn, i) \
>> @@ -1337,7 +1338,35 @@ static const struct attribute_group
>> *arm_cmn_attr_groups[] = {
>> static int arm_cmn_wp_idx(struct perf_event *event)
>> {
>> - return CMN_EVENT_EVENTID(event) + CMN_EVENT_WP_GRP(event);
>> + struct arm_cmn_hw_event *hw = to_cmn_hw(event);
>> +
>> + return hw->wp_idx;
>
> Sorry, this breaks group validation.
Clearly, watchpoint group validation was missing from my tests :(
>
>> +}
>> +
>> +static int arm_cmn_wp_idx_unused(struct perf_event *event, struct
>> arm_cmn_dtm *dtm,
>> + struct arm_cmn_dtc *dtc)
>> +{
>> + struct arm_cmn_hw_event *hw = to_cmn_hw(event);
>> + int idx, tmp, direction = CMN_EVENT_EVENTID(event);
>> +
>> + /*
>> + * Examine wp 0 & 1 for the up direction,
>> + * examine wp 2 & 3 for the down direction
>> + */
>> + for (idx = direction; idx < direction + 2; idx++)
>> + if (dtm->wp_event[idx] < 0)
>> + break;
>> +
>> + if (idx == direction + 2)
>> + return -ENOSPC;
>> +
>> + tmp = dtm->wp_event[idx ^ 1];
>> + if (tmp >= 0 && CMN_EVENT_WP_COMBINE(event) !=
>> + CMN_EVENT_WP_COMBINE(dtc->counters[tmp]))
>> + return -ENOSPC;
>> +
>> + hw->wp_idx = idx;
>
> I don't really get this logic either - we can allocate a potentially
> different index for every DTM, but only store the most recent one?
>
>> + return hw->wp_idx;
>> }
>> static u32 arm_cmn_wp_config(struct perf_event *event)
>> @@ -1785,6 +1814,8 @@ static void arm_cmn_event_clear(struct arm_cmn *cmn,
>> struct perf_event *event,
>> for_each_hw_dtc_idx(hw, j, idx)
>> cmn->dtc[j].counters[idx] = NULL;
>> +
>> + hw->wp_idx = -1;
>> }
>> static int arm_cmn_event_add(struct perf_event *event, int flags)
>> @@ -1794,6 +1825,7 @@ static int arm_cmn_event_add(struct perf_event
>> *event, int flags)
>> struct arm_cmn_node *dn;
>> enum cmn_node_type type = CMN_EVENT_TYPE(event);
>> unsigned int input_sel, i = 0;
>> + int wp_idx;
>> if (type == CMN_TYPE_DTC) {
>> while (cmn->dtc[i].cycles)
>> @@ -1822,6 +1854,7 @@ static int arm_cmn_event_add(struct perf_event
>> *event, int flags)
>> }
>> /* ...then the local counters to feed them */
>> + wp_idx = -1;
>
> Oh, I guess this trying to avoid some of that issue, but I still don't think
> it works - say we add an event targeted to XP B, which sees WP0 is free on
> DTM B so allocates index 0; then we add another event aggregating across XPs
> A and B, which sees WP0 is free on DTM A, allocates index 0, then goes on to
> stomp WP0 on DTM B as well - oops.
>
> I don't think it's going to be feasible to do this without tracking the full
> allocation state with a wp_idx bitmap in the hw_event - at least it only
> needs to be half the size of dtm_idx, so I think there's still room.
Yeah, it was supposed to be simple and stupid version until I'd have time
to make the allocation per node. But the more I think about this, the more
I start to believe that the bitmap solution would be the better option
right away.
I'll take a look at better solution although it might take a while as I'll
probably get other more urgent tasks soon. If you find yourself with too
much free time, feel free to take this task ;)
Cheers, Ilkka
>
> Thanks,
> Robin.
>
>> for_each_hw_dn(hw, dn, i) {
>> struct arm_cmn_dtm *dtm = &cmn->dtms[dn->dtm] +
>> hw->dtm_offset;
>> unsigned int dtm_idx, shift, d = max_t(int, dn->dtc, 0);
>> @@ -1835,16 +1868,17 @@ static int arm_cmn_event_add(struct perf_event
>> *event, int flags)
>> if (type == CMN_TYPE_XP) {
>> input_sel = CMN__PMEVCNT0_INPUT_SEL_XP + dtm_idx;
>> } else if (type == CMN_TYPE_WP) {
>> - int tmp, wp_idx = arm_cmn_wp_idx(event);
>> u32 cfg = arm_cmn_wp_config(event);
>> - if (dtm->wp_event[wp_idx] >= 0)
>> - goto free_dtms;
>> -
>> - tmp = dtm->wp_event[wp_idx ^ 1];
>> - if (tmp >= 0 && CMN_EVENT_WP_COMBINE(event) !=
>> -
>> CMN_EVENT_WP_COMBINE(cmn->dtc[d].counters[tmp]))
>> - goto free_dtms;
>> + /*
>> + * wp_config register index is currently allocated
>> per
>> + * mesh instance rather than per node.
>> + */
>> + if (wp_idx < 0) {
>> + wp_idx = arm_cmn_wp_idx_unused(event, dtm,
>> &cmn->dtc[d]);
>> + if (wp_idx < 0)
>> + goto free_dtms;
>> + }
>> input_sel = CMN__PMEVCNT0_INPUT_SEL_WP + wp_idx;
>> dtm->wp_event[wp_idx] = hw->dtc_idx[d];
>
More information about the linux-arm-kernel
mailing list