RISC-V PMU driver issue

Tue May 7 23:57:12 PDT 2024

On Tue, May 7, 2024 at 8:01 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
>
> I found the issue. `if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))`
> should be `if (!rvpmu->ctr_read || (hwc->state & PERF_HES_UPTODATE))`. Now it
> works.
>

Ahh I had a typo there. Thanks for verifying the fix. I will send the
patch soon.

>
> -----Original Messages-----
> From: "Atish Kumar Patra" <atishp at rivosinc.com>
> Send time: Tuesday, 05/07/2024 05:03:33
> To: 雷博涵 <garthlei at pku.edu.cn>
> Subject: Re: Re: RISC-V PMU driver issue
>
> Can you provide some more details about your platform and how are you testing it?
> I have verified the change in qemu with kvm implementation and the riscv_pmu_event_update the value as PERF_HES_UPTODATE is already set.
> Can you print some more debug messages in riscv_pmu_event_update to see why it gets updated twice after this change?
>
> On Thu, May 2, 2024 at 8:42 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
> I tried the diff on my platform (upstream kernel commit version 2c81593),
> but it did not seem to work. The problem persists.
>
>
> > -----Original Messages-----
> > From: "Atish Patra" <atishp at rivosinc.com>
> > Send time:Saturday, 04/27/2024 08:07:07
> > To: 雷博涵 <garthlei at pku.edu.cn>, "Atish Patra" <atishp at atishpatra.org>
> > Cc: linux-riscv at lists.infradead.org
> > Subject: Re: RISC-V PMU driver issue
> >
> > On 3/18/24 15:18, Atish Patra wrote:
> > > On 3/12/24 21:29, 雷博涵 wrote:
> > >> Sorry for the duplicated message. The last one was an HTML e-mail so I
> > >> resend this.
> > >>
> > >>> 2024年3月13日 08:02，Atish Patra <atishp at atishpatra.org> 写道：
> > >>>
> > >>> On Tue, Feb 27, 2024 at 7:49 PM 雷博涵 <garthlei at pku.edu.cn> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I am having problems with the RISC-V PMU driver. The overflow
> > >>>> handler of my custom perf event kernel counter seems to read an
> > >>>> incorrect value from the event.
> > >>>
> > >>> If I understand you correctly, you have a custom kernel counter for
> > >>> the perf event ? Can you explain the use case for that ?
> > >>
> > >> Actually it was the perf-event backend of an open-source tool called
> > >> PMCTrack (https://github.com/jcsaezal/pmctrack), which I was trying to
> > >> port to RISC-V. It creates a kernel counter using
> > >> `perf_event_create_kernel_counter`, registering an overflow handler.
> > >> The registered overflow handler calls `perf_event_read_value` to read
> > >> the counter value.
> > >>
> > >
> > > Thanks. It makes sense now.
> > >
> > >>>
> > >>>> It seems that the issue lies in the function
> > >>>> `riscv_pmu_event_set_period`, which sets `prev_count` to a new value
> > >>>> but does not modify the underlying hardware counter. When
> > >>>> `perf_event_read_value` gets called later in the user-defined
> > >>>> overflow handler, `riscv_pmu_event_update` will update the `count`
> > >>>> field again based on the unmodified hardware counter value and the
> > >>>> modified `prev_count` field, which causes an incorrect reading.
> > >>>
> > >>> What do you mean by user defined overflow handler ? The overflow
> > >>> handler registered via the custom perf event kernel counter which gets
> > >>> invoked from perf_event_overflow ?
> > >>
> > >> Yes.
> > >>
> > >>>
> > >>>> I noticed that other PMU drivers, such as the ARM one, write to the
> > >>>> underlying counter in their set_period functions, which prevents the
> > >>>> problem. However, the RISC-V SBI specification does not have such an
> > >>>> API to write to a counter without starting it. Using
> > >>>> `local64_read(&hw_evt->period_left) <= 0` directly as the guard
> > >>>> condition to place `riscv_pmu_event_set_period(event)` after it
> > >>>> seems to work, but I am not sure whether it can cause other issues.
> > >>>
> > >>> Not sure the exact code path you are referring to. You don't want to
> > >>> invoke  riscv_pmu_event_set_period if period_left <= 0 ?
> > >>
> > >> First, `pmu_sbi_ovf_handler` calls `riscv_pmu_event_update`, so the
> > >> perf event counter value (`event->count`) is updated. It calls
> > >> `riscv_pmu_event_set_period` later, which calls
> > >> `local64_set(&hwc->prev_count, (u64)-left)`. When
> > >> `perf_event_read_value` gets invoked in the registered overflow
> > >> handler, `riscv_pmu_event_update` will be invoked eventually. Because
> > >> `prev_count` has been set to `(u64)-left` and the underlying hardware
> > >> counter value (the value read via `rvpmu->ctr_read(event)`) has not
> > >> been changed, `event->count` will be updated again, and the value this
> > >> time will be incorrect.
> > >
> > > Got it. This is a genuine issue. I can reproduce it as well with kvm pmu
> > > support as it creates a kernel perf event as well.
> > >
> > >>
> > >> I have found that the ARM PMU does not have the problem. The function
> > >> `armpmu_event_set_period` in `arm_pmu.c` calls
> > >> `armpmu->write_counter(event, (u64)(-left) & max_period)` after
> > >> `local64_set(&hwc->prev_count, (u64)-left)`, so that the underlying
> > >> hardware counter is also set to a new value. Thus, when
> > >> `armpmu_event_update` gets called later in the registered handler, it
> > >> will find that the underlying counter has the same value as
> > >> `prev_count`, so `event->count` can keep the correct value.
> > >>
> > >> However, as I mentioned before, the RISC-V SBI specification does not
> > >> have such an API as “write_counter,” so the ARM solution could not be
> > >> adapted easily. I was wondering if we could postpone
> > >> `riscv_pmu_event_set_period` after the if-block, like this:
> > >>
> > >> ```
> > >> if (local64_read(&hw_evt->period_left) <= 0) {
> > >>    perf_event_overflow(event, &data, regs);
> > >> }
> > >> riscv_pmu_event_set_period(event);
> > >> ```
> > >>
> > >> This way `perf_event_read_value` can return the correct counter value.
> > >> `riscv_pmu_event_set_period` gets invoked after perf_event_overflow.
> > >> However, I am not sure if it can cause other problems.
> > >>
> > >
> > > That seems like a hack. I think we can avoid updating again based on
> > > perf state which can indicate that event->count is already updated.
> > > Thus, no need to update it again. I will toy around with this idea and
> > > get back to you.
> > >
> >
> > Can you try this diff on latest upstream ?
> >
> >
> > diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> > index b4efdddb2ad9..e08529c45c50 100644
> > --- a/drivers/perf/riscv_pmu.c
> > +++ b/drivers/perf/riscv_pmu.c
> > @@ -167,7 +167,7 @@ u64 riscv_pmu_event_update(struct perf_event *event)
> >          unsigned long cmask;
> >          u64 oldval, delta;
> >
> > -       if (!rvpmu->ctr_read)
> > +       if (!rvpmu->ctr_read || (hwc->state == PERF_HES_UPTODATE))
> >                  return 0;
> >
> >          cmask = riscv_pmu_ctr_get_width_mask(event);
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 8cbe6e5f9c39..6ba825a38619 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -763,7 +763,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq,
> > void *dev)
> >                   */
> >                  overflowed_ctrs |= BIT(lidx);
> >                  hw_evt = &event->hw;
> > +               /* Update the event states here so that we know the
> > state while reading */
> > +               hw_evt->state |= PERF_HES_STOPPED;
> >                  riscv_pmu_event_update(event);
> > +               hw_evt->state |= PERF_HES_UPTODATE;
> >                  perf_sample_data_init(&data, 0, hw_evt->last_period);
> >                  if (riscv_pmu_event_set_period(event)) {
> >                          /*
> > @@ -776,6 +779,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void
> > *dev)
> >                           */
> >                          perf_event_overflow(event, &data, regs);
> >                  }
> > +               /* Reset the state as we are going to start the counter
> > after the loop */
> > +               hw_evt->state = 0;
> >          }
> >
> >
> >
> > All the changes in drivers/perf/riscv_pmu_sbi.c are already part of
> > ongoing series with snapshot support[1].
> >
> > I will send the changes in drivers/perf/riscv_pmu.c as a separate fix if
> > this works for you.
> >
> > [1] https://lore.kernel.org/kvm/20240420151741.962500-1-atishp@rivosinc.com/
> >
> > >>>
> > >>>>
> > >>>> Thank you,
> > >>>> Bohan Lei
> > >>>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Atish
> > >>
> > >> --
> > >> Regards,
> > >> Bohan Lei
> > >>
> > >> _______________________________________________
> > >> linux-riscv mailing list
> > >> linux-riscv at lists.infradead.org
> > >> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv at lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv