RISC-V PMU driver issue

Mon Mar 18 15:18:32 PDT 2024

On 3/12/24 21:29, 雷博涵 wrote:
> Sorry for the duplicated message. The last one was an HTML e-mail so I resend this.
> 
>> 2024年3月13日 08:02，Atish Patra <atishp at atishpatra.org> 写道：
>>
>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
>>>
>>> Hi all,
>>>
>>> I am having problems with the RISC-V PMU driver. The overflow handler of my custom perf event kernel counter seems to read an incorrect value from the event.
>>
>> If I understand you correctly, you have a custom kernel counter for
>> the perf event ? Can you explain the use case for that ?
> 
> Actually it was the perf-event backend of an open-source tool called PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to port to RISC-V. It creates a kernel counter using `perf_event_create_kernel_counter`, registering an overflow handler. The registered overflow handler calls `perf_event_read_value` to read the counter value.
> 

Thanks. It makes sense now.

>>
>>> It seems that the issue lies in the function `riscv_pmu_event_set_period`, which sets `prev_count` to a new value but does not modify the underlying hardware counter. When `perf_event_read_value` gets called later in the user-defined overflow handler, `riscv_pmu_event_update` will update the `count` field again based on the unmodified hardware counter value and the modified `prev_count` field, which causes an incorrect reading.
>>
>> What do you mean by user defined overflow handler ? The overflow
>> handler registered via the custom perf event kernel counter which gets
>> invoked from perf_event_overflow ?
> 
> Yes.
> 
>>
>>> I noticed that other PMU drivers, such as the ARM one, write to the underlying counter in their set_period functions, which prevents the problem. However, the RISC-V SBI specification does not have such an API to write to a counter without starting it. Using `local64_read(&hw_evt->period_left) <= 0` directly as the guard condition to place `riscv_pmu_event_set_period(event)` after it seems to work, but I am not sure whether it can cause other issues.
>>
>> Not sure the exact code path you are referring to. You don't want to
>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
> 
> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the perf event counter value (`event->count`) is updated. It calls `riscv_pmu_event_set_period` later, which calls `local64_set(&hwc->prev_count, (u64)-left)`. When `perf_event_read_value` gets invoked in the registered overflow handler, `riscv_pmu_event_update` will be invoked eventually. Because `prev_count` has been set to `(u64)-left` and the underlying hardware counter value (the value read via `rvpmu->ctr_read(event)`) has not been changed, `event->count` will be updated again, and the value this time will be incorrect.

Got it. This is a genuine issue. I can reproduce it as well with kvm pmu 
support as it creates a kernel perf event as well.

> 
> I have found that the ARM PMU does not have the problem. The function `armpmu_event_set_period` in `arm_pmu.c` calls `armpmu->write_counter(event, (u64)(-left) & max_period)` after `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying hardware counter is also set to a new value. Thus, when `armpmu_event_update` gets called later in the registered handler, it will find that the underlying counter has the same value as `prev_count`, so `event->count` can keep the correct value.
> 
> However, as I mentioned before, the RISC-V SBI specification does not have such an API as “write_counter,” so the ARM solution could not be adapted easily. I was wondering if we could postpone `riscv_pmu_event_set_period` after the if-block, like this:
> 
> ```
> if (local64_read(&hw_evt->period_left) <= 0) {
>    perf_event_overflow(event, &data, regs);
> }
> riscv_pmu_event_set_period(event);
> ```
> 
> This way `perf_event_read_value` can return the correct counter value. `riscv_pmu_event_set_period` gets invoked after perf_event_overflow. However, I am not sure if it can cause other problems.
> 

That seems like a hack. I think we can avoid updating again based on 
perf state which can indicate that event->count is already updated. 
Thus, no need to update it again. I will toy around with this idea and 
get back to you.

>>
>>>
>>> Thank you,
>>> Bohan Lei
>>
>>
>> -- 
>> Regards,
>> Atish
> 
> --
> Regards,
> Bohan Lei
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv