RISC-V PMU driver issue

雷博涵 garthlei at pku.edu.cn
Tue Mar 12 21:29:02 PDT 2024


Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.

> 2024年3月13日 08:02,Atish Patra <atishp at atishpatra.org> 写道:
> 
> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
>> 
>> Hi all,
>> 
>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
> 
> If I understand you correctly, you have a custom kernel counter for
> the perf event ? Can you explain the use case for that ?

Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.

> 
>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
> 
> What do you mean by user defined overflow handler ? The overflow
> handler registered via the custom perf event kernel counter which gets
> invoked from perf_event_overflow ?

Yes.

> 
>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
> 
> Not sure the exact code path you are referring to. You don't want to
> invoke  riscv_pmu_event_set_period if period_left <= 0 ?

First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.

I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.

However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:

```
if (local64_read(&hw_evt->period_left) <= 0) {
  perf_event_overflow(event, &data, regs);
}
riscv_pmu_event_set_period(event);
```

This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.

> 
>> 
>> Thank you,
>> Bohan Lei
> 
> 
> -- 
> Regards,
> Atish

--
Regards,
Bohan Lei



More information about the linux-riscv mailing list