RISC-V PMU driver issue
雷博涵
garthlei at pku.edu.cn
Tue May 7 20:01:06 PDT 2024
I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
works.
-----Original Messages-----
From: "Atish Kumar Patra" <atishp at rivosinc.com>
Send time: Tuesday, 05/07/2024 05:03:33
To: 雷博涵 <garthlei at pku.edu.cn>
Subject: Re: Re: RISC-V PMU driver issue
Can you provide some more details about your platform and how are you testing it?
I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?
On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
I tried the diff on my platform (upstream kernel commit version 2c81593),
but it did not seem to work. The problem persists.
> -----Original Messages-----
> From: "Atish Patra" <atishp at rivosinc.com>
> Send time:Saturday, 04/27/2024 08:07:07
> To: 雷博涵 <garthlei at pku.edu.cn>, "Atish Patra" <atishp at atishpatra.org>
> Cc: linux-riscv at lists.infradead.org
> Subject: Re: RISC-V PMU driver issue
>
> On 3/18/24 15:18, Atish Patra wrote:
> > On 3/12/24 21:29, 雷博涵 wrote:
> >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> >> resend this.
> >>
> >>> 2024年3月13日 08:02,Atish Patra <atishp at atishpatra.org> 写道:
> >>>
> >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I am having problems with the RISC-V PMU driver. The overflow
> >>>> handler of my custom perf event kernel counter seems to read an
> >>>> incorrect value from the event.
> >>>
> >>> If I understand you correctly, you have a custom kernel counter for
> >>> the perf event ? Can you explain the use case for that ?
> >>
> >> Actually it was the perf-event backend of an open-source tool called
> >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> >> port to RISC-V. It creates a kernel counter using
> >> `perf_event_create_kernel_counter`, registering an overflow handler.
> >> The registered overflow handler calls `perf_event_read_value` to read
> >> the counter value.
> >>
> >
> > Thanks. It makes sense now.
> >
> >>>
> >>>> It seems that the issue lies in the function
> >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> >>>> but does not modify the underlying hardware counter. When
> >>>> `perf_event_read_value` gets called later in the user-defined
> >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> >>>> field again based on the unmodified hardware counter value and the
> >>>> modified `prev_count` field, which causes an incorrect reading.
> >>>
> >>> What do you mean by user defined overflow handler ? The overflow
> >>> handler registered via the custom perf event kernel counter which gets
> >>> invoked from perf_event_overflow ?
> >>
> >> Yes.
> >>
> >>>
> >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> >>>> underlying counter in their set_period functions, which prevents the
> >>>> problem. However, the RISC-V SBI specification does not have such an
> >>>> API to write to a counter without starting it. Using
> >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> >>>> seems to work, but I am not sure whether it can cause other issues.
> >>>
> >>> Not sure the exact code path you are referring to. You don't want to
> >>> invoke riscv_pmu_event_set_period if period_left <= 0 ?
> >>
> >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> >> perf event counter value (`event->count`) is updated. It calls
> >> `riscv_pmu_event_set_period` later, which calls
> >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> >> `perf_event_read_value` gets invoked in the registered overflow
> >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> >> been changed, `event->count` will be updated again, and the value this
> >> time will be incorrect.
> >
> > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > support as it creates a kernel perf event as well.
> >
> >>
> >> I have found that the ARM PMU does not have the problem. The function
> >> `armpmu_event_set_period` in `arm_pmu.c` calls
> >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> >> hardware counter is also set to a new value. Thus, when
> >> `armpmu_event_update` gets called later in the registered handler, it
> >> will find that the underlying counter has the same value as
> >> `prev_count`, so `event->count` can keep the correct value.
> >>
> >> However, as I mentioned before, the RISC-V SBI specification does not
> >> have such an API as “write_counter,” so the ARM solution could not be
> >> adapted easily. I was wondering if we could postpone
> >> `riscv_pmu_event_set_period` after the if-block, like this:
> >>
> >> ```
> >> if (local64_read(&hw_evt->period_left) <= 0) {
> >> perf_event_overflow(event, &data, regs);
> >> }
> >> riscv_pmu_event_set_period(event);
> >> ```
> >>
> >> This way `perf_event_read_value` can return the correct counter value.
> >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> >> However, I am not sure if it can cause other problems.
> >>
> >
> > That seems like a hack. I think we can avoid updating again based on
> > perf state which can indicate that event->count is already updated.
> > Thus, no need to update it again. I will toy around with this idea and
> > get back to you.
> >
>
> Can you try this diff on latest upstream ?
>
>
> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index b4efdddb2ad9..e08529c45c50 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
> unsigned long cmask;
> u64 oldval, delta;
>
> - if (!rvpmu->ctr_read)
> + if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
> return 0;
>
> cmask = riscv_pmu_ctr_get_width_mask(event);
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 8cbe6e5f9c39..6ba825a38619 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> void *dev)
> */
> overflowed_ctrs |= BIT(lidx);
> hw_evt = &event->hw;
> + /* Update the event states here so that we know the
> state while reading */
> + hw_evt->state |= PERF_HES_STOPPED;
> riscv_pmu_event_update(event);
> + hw_evt->state |= PERF_HES_UPTODATE;
> perf_sample_data_init(&data, 0, hw_evt->last_period);
> if (riscv_pmu_event_set_period(event)) {
> /*
> @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> *dev)
> */
> perf_event_overflow(event, &data, regs);
> }
> + /* Reset the state as we are going to start the counter
> after the loop */
> + hw_evt->state = 0;
> }
>
>
>
> All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> ongoing series with snapshot support[1].
>
> I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> this works for you.
>
> [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
>
> >>>
> >>>>
> >>>> Thank you,
> >>>> Bohan Lei
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Atish
> >>
> >> --
> >> Regards,
> >> Bohan Lei
> >>
> >> _______________________________________________
> >> linux-riscv mailing list
> >> linux-riscv at lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the linux-riscv
mailing list