[PATCH 2/2] perf cs-etm: Set time on synthesised samples to preserve ordering

Suzuki K Poulose suzuki.poulose at arm.com
Wed Apr 14 16:54:55 BST 2021


On 14/04/2021 15:39, James Clark wrote:
> The following attribute is set when synthesising samples in
> timed decoding mode:
> 
>      attr.sample_type |= PERF_SAMPLE_TIME;
> 
> This results in new samples that appear to have timestamps but
> because we don't assign any timestamps to the samples, when the
> resulting inject file is opened again, the synthesised samples
> will be on the wrong side of the MMAP or COMM events.
> 
> For example this results in the samples being associated with
> the perf binary, rather than the target of the record:
> 
>      perf record -e cs_etm/@tmc_etr0/u top
>      perf inject -i perf.data -o perf.inject --itrace=i100il
>      perf report -i perf.inject
> 
> Where 'Command' == perf should show as 'top':
> 
>      # Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
>      # ........  .......  ....................  ......................  ......................  ..................
>      #
>          31.08%  perf     [unknown]             [.] 0x000000000040c3f8  [.] 0x000000000040c3e8  -
> 
> If the perf.data file is opened directly with perf, without the
> inject step, then this already works correctly because the
> events are synthesised after the COMM and MMAP events and
> no second sorting happens. Re-sorting only happens when opening
> the perf.inject file for the second time so timestamps are
> needed.
> 
> Using the timestamp from the AUX record mirrors the current
> behaviour when opening directly with perf, because the events
> are generated on the call to cs_etm__process_queues().

I would add the following here to clarify the situation :

"The ETM trace could optionally contain time stamps, but there is
no way to correlate this with the kernel time. So, the best available
time value is that of the AUX_RECORD header. This patch uses
the timestamp from the header for all the samples. The ordering of the
samples are implicit in the trace and thus is fine with respect to
relative ordering."


> 
> Signed-off-by: James Clark <james.clark at arm.com>
> Co-developed-by: Al Grant <al.grant at arm.com>
> Signed-off-by: Al Grant <al.grant at arm.com>

nit: The preferred order is your S-o-B at the end (i.e of the sender)

> ---
>   tools/perf/util/cs-etm.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index c25da2ffa8f3..d0fa9dce47f1 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -54,6 +54,7 @@ struct cs_etm_auxtrace {
>   	u8 sample_instructions;
>   
>   	int num_cpu;
> +	u64 latest_kernel_timestamp;
>   	u32 auxtrace_type;
>   	u64 branches_sample_type;
>   	u64 branches_id;
> @@ -1192,6 +1193,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>   	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr);
>   	event->sample.header.size = sizeof(struct perf_event_header);
>   
> +	if (!etm->timeless_decoding)
> +		sample.time = etm->latest_kernel_timestamp;
>   	sample.ip = addr;
>   	sample.pid = tidq->pid;
>   	sample.tid = tidq->tid;
> @@ -1248,6 +1251,8 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>   	event->sample.header.misc = cs_etm__cpu_mode(etmq, ip);
>   	event->sample.header.size = sizeof(struct perf_event_header);
>   
> +	if (!etm->timeless_decoding)
> +		sample.time = etm->latest_kernel_timestamp;
>   	sample.ip = ip;
>   	sample.pid = tidq->pid;
>   	sample.tid = tidq->tid;
> @@ -2412,9 +2417,10 @@ static int cs_etm__process_event(struct perf_session *session,
>   	else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
>   		return cs_etm__process_switch_cpu_wide(etm, event);
>   
> -	if (!etm->timeless_decoding &&
> -	    event->header.type == PERF_RECORD_AUX)

Might want to add a comment here ,

	/*
	 * Record the latest kernel timestamp available in the header
	 * for samples.
	 */

> +	if (!etm->timeless_decoding && event->header.type == PERF_RECORD_AUX) {
> +		etm->latest_kernel_timestamp = sample_kernel_timestamp;
>   		return cs_etm__process_queues(etm);
> +	}
>   
>   	return 0;
>   }
> 

This is the best effort thing we could do to get things working.

With the comments addressed:

Acked-by: Suzuki K Poulos <suzuki.poulose at arm.com>



More information about the linux-arm-kernel mailing list