[PATCH v2 1/2] perf: arm_spe: Correct setting the PERF_HES_STOPPED flag

Will Deacon will at kernel.org
Tue Jan 20 08:42:39 PST 2026


On Wed, Jan 14, 2026 at 05:52:40PM +0000, Leo Yan wrote:
> > > The issue is a mismatch between the state machine and the hardware
> > > state.  When arm_spe_perf_aux_output_begin() detects an error and does
> > > not set PMBLIMITR_EL1_E, the trace unit is effectively stopped, but
> > > the state machine is not updated to PERF_HES_STOPPED. This causes
> > > callers to handle errors incorrectly [1][2].
> > > 
> > > It is arguable that the disable IRQ work will eventually disable the
> > > trace unit and update hw.state, but the state should be updated in the
> > > first place by the PMU driver to notify even core layer.
> > 
> > From what I can tell, perf_aux_output_end() will call
> > perf_event_disable_inatomic() which should end up invoking
> > perf_pending_disable() via an IPI-to-self to disable the event and put
> > it in the PERF_HES_STOPPED state before we return to userspace.
> > 
> > So I still struggle to see the problem here.
> 
> The issue is that the SPE driver does not properly propagate errors when
> arm_spe_pmu_next_off() fails.  Instead, it behaves as if tracing was
> enabled successfully, which leads to redundant operations and an
> inconsistent state in the perf core.
> 
> Let us dig a bit.
> 
>   arm_spe_pmu_start()
>   {
>       hwc->state = 0;
> 
>       /* Fails inside arm_spe_pmu_next_off() */
>       arm_spe_perf_aux_output_begin(handle, event);
> 
>       /* hwc->state remains 0, so execution continues */
>       if (hwc->state)
>           return;
> 
>       reg = arm_spe_event_to_pmsfcr(event);
>       write_sysreg_s(reg, SYS_PMSFCR_EL1);
>       ...
>   }
> 
> In arm_spe_pmu_start(), a failure in arm_spe_perf_aux_output_begin()
> does not set PERF_HES_STOPPED, so hwc->state remains zero and the
> function continues to program filters even though has failed.
> 
> Moveover, the driver still returns success to the perf core.  As a
> result, event_sched_in() assumes the event was started correctly and
> proceeds to enable other events.
> 
>   event_sched_in()
>   {
>       ...
> 
>       if (event->pmu->add(event, PERF_EF_START)) {
>         perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
>         event->oncpu = -1;
>         ret = -EAGAIN;
>         goto out;
>       }
> 
>       ...
>   }
> 
> This breaks event group case, for example:
> 
>   perf record -e '{cs_etm//,cycles}' -- test
> 
> The perf core expects all events in a group to start and stop together,
> but the SPE driver's incorrect reporting causes misalignment.

Ok, so looking at this and the next patch I wonder if we could simplify
things a little by having arm_spe_perf_aux_output_begin() return an 'int'
to indicate success/failure instead of touching 'hwc->state'.

Then arm_spe_pmu_start() and the interrupt handler could call into
arm_spe_pmu_stop() if they get an error code back. Would that work?

Will



More information about the linux-arm-kernel mailing list