[PATCH v2 1/2] perf: arm_spe: Correct setting the PERF_HES_STOPPED flag

Leo Yan leo.yan at arm.com
Wed Jan 14 09:52:40 PST 2026


On Thu, Jan 08, 2026 at 04:23:58PM +0000, Will Deacon wrote:

[...]

> > > How is it not for this flow? You're talking about:
> > > 
> > > arm_spe_pmu_start
> > > 	=> arm_spe_perf_aux_output_begin
> > > 		=> arm_spe_pmu_next_off // Returns error
> > > 
> > > The only way arm_spe_pmu_next_off() returns an error is if
> > > __arm_spe_pmu_next_off() fails, and that's the flow I'm talking about.

[...]

> > The issue is a mismatch between the state machine and the hardware
> > state.  When arm_spe_perf_aux_output_begin() detects an error and does
> > not set PMBLIMITR_EL1_E, the trace unit is effectively stopped, but
> > the state machine is not updated to PERF_HES_STOPPED. This causes
> > callers to handle errors incorrectly [1][2].
> > 
> > It is arguable that the disable IRQ work will eventually disable the
> > trace unit and update hw.state, but the state should be updated in the
> > first place by the PMU driver to notify even core layer.
> 
> From what I can tell, perf_aux_output_end() will call
> perf_event_disable_inatomic() which should end up invoking
> perf_pending_disable() via an IPI-to-self to disable the event and put
> it in the PERF_HES_STOPPED state before we return to userspace.
> 
> So I still struggle to see the problem here.

The issue is that the SPE driver does not properly propagate errors when
arm_spe_pmu_next_off() fails.  Instead, it behaves as if tracing was
enabled successfully, which leads to redundant operations and an
inconsistent state in the perf core.

Let us dig a bit.

  arm_spe_pmu_start()
  {
      hwc->state = 0;

      /* Fails inside arm_spe_pmu_next_off() */
      arm_spe_perf_aux_output_begin(handle, event);

      /* hwc->state remains 0, so execution continues */
      if (hwc->state)
          return;

      reg = arm_spe_event_to_pmsfcr(event);
      write_sysreg_s(reg, SYS_PMSFCR_EL1);
      ...
  }

In arm_spe_pmu_start(), a failure in arm_spe_perf_aux_output_begin()
does not set PERF_HES_STOPPED, so hwc->state remains zero and the
function continues to program filters even though has failed.

Moveover, the driver still returns success to the perf core.  As a
result, event_sched_in() assumes the event was started correctly and
proceeds to enable other events.

  event_sched_in()
  {
      ...

      if (event->pmu->add(event, PERF_EF_START)) {
        perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
        event->oncpu = -1;
        ret = -EAGAIN;
        goto out;
      }

      ...
  }

This breaks event group case, for example:

  perf record -e '{cs_etm//,cycles}' -- test

The perf core expects all events in a group to start and stop together,
but the SPE driver's incorrect reporting causes misalignment.

Sorry for late reply.

Thanks,
Leo



More information about the linux-arm-kernel mailing list