[PATCH 1/8] perf cs-etm: Don't flush when packet_queue fills up

Leo Yan leo.yan at arm.com
Tue Sep 10 13:28:48 PDT 2024


On 9/5/2024 11:50 AM, James Clark wrote:

[...]

> cs_etm__flush(), like cs_etm__sample() is an operation that generates a
> sample and then swaps the current with the previous packet. Calling
> flush after processing the queues results in two swaps which corrupts
> the next sample. Therefore it wasn't appropriate to call flush here so
> remove it.

In the cs_etm__sample(), if the period is not overflow, it is not necessarily
to generate instruction samples and copy back stack entries. This is why we
want to call cs_etm__flush() to make sure the last packet can be recorded
properly for instruction sample with back stacks.

We also need to take account into the case for the end of the session - in
this case we need to generate samples for the last packet for complete info.

I am wandering should we remove the cs_etm__packet_swap() from cs_etm__sample()?

Thanks,
Leo

> Flushing is still done on a discontinuity to explicitly clear the last
> branch buffer, but when the packet_queue fills up before reaching a
> timestamp, that's not a discontinuity and the call to
> cs_etm__process_traceid_queue() already generated samples and drained
> the buffers correctly.
> 
> This is visible by looking for a branch that has the same target as the
> previous branch and the following source is before the address of the
> last target, which is impossible as execution would have had to have
> gone backwards:
> 
>   ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
>    (packet_queue fills here before a timestamp, resulting in a flush and
>     branch target ffff80008011cadc is duplicated.)
>   ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff80008011cadc update_sg_lb_stats+0x94
>   ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
> 
> After removing the flush the correct branch target is used for the
> second sample, and ffff8000801117c4 is no longer before the previous
> address:
> 
>   ffff800080849d40 _find_next_and_bit+0x78 => ffff80008011cadc update_sg_lb_stats+0x94
>   ffff80008011cb1c update_sg_lb_stats+0xd4 => ffff8000801117a0 cpu_util+0x0
>   ffff8000801117c4 cpu_util+0x24 => ffff8000801117d4 cpu_util+0x34
> 
> Fixes: 21fe8dc1191a ("perf cs-etm: Add support for CPU-wide trace scenarios")
> Reported-by: Ganapatrao Kulkarni <gankulkarni at os.amperecomputing.com>
> Closes: https://lore.kernel.org/all/20240719092619.274730-1-gankulkarni@os.amperecomputing.com/
> Signed-off-by: James Clark <james.clark at linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 90f32f327b9b..602e791ff5ba 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2490,12 +2490,6 @@ static void cs_etm__clear_all_traceid_queues(struct cs_etm_queue *etmq)
> 
>                 /* Ignore return value */
>                 cs_etm__process_traceid_queue(etmq, tidq);
> -
> -               /*
> -                * Generate an instruction sample with the remaining
> -                * branchstack entries.
> -                */
> -               cs_etm__flush(etmq, tidq);
>         }
>  }
> 
> --
> 2.34.1
> 



More information about the linux-arm-kernel mailing list