[RFC PATCH v2 0/4] A mechanism for efficient support for per-function metrics

Tue Apr 23 08:42:42 PDT 2024

> Cursory testing on a Xeon(R) W-2145 with a 300 *instruction* sample
> window (with and without the patch) suggests this approach would work
> for some counters. Calculating branch miss rates for example appears to
> be correct when used with the instruction counter as the sampling event,
> or at least this approach correctly identifies which functions in the
> test benchmark are prone to poor predictability. On the other hand the
> combination cycle and instructions counter does not appear to sample
> correctly as a pair. With something like
> 
>     perf record -e '{cycles/period=999700,alt-period=300/,instructions}:uS' ... benchmark
> 
> I often see very large CPI, the same affect is observed without the
> patch enabled. No idea whats going on there, so any insight welcome...

My guess would be that the PMI handler cleared L1 and there are stalls
reloading the working set. You can check L1 miss events to confirm.
Unfortunately with the period change it cannot use multi-record
PEBS which would avoid the need for a PMI.

-Andi