[PATCH] perf cs-etm: Fix kernel timestamp handling

James Clark james.clark at arm.com
Tue Sep 12 03:57:17 PDT 2023



On 12/09/2023 02:52, Leo Yan wrote:
> Hi James,
> 
> On Mon, Sep 11, 2023 at 02:24:09PM +0100, James Clark wrote:
> 
> [...]
> 
>>> The current code can support both timestamp sources when synthesizing
>>> samples. However, the decoding flow only relies on the hardware
>>> timestamp.  If the hardware timestamp is zero, it becomes impossible to
>>> decode the trace data.  Consequently, in this case, the commands below
>>> won't output any samples:
>>>
>>>     perf record -e cs_etm// --per-thread --timestamp -- ls
>>>     perf script
>>
>> Hi Leo,
>>
>> I couldn't reproduce this issue, even when hard coding the hardware
>> timestamp to zero in cs_etm_decoder__do_hard_timestamp() like this:
>>
>>  converted_timestamp = 0;
> 
> I reproduced this issue on the Juno-r2 board, its "etm->has_virtual_ts"
> is always false and Arm CoreSight timestamp packets are zeros.
> 
> Besides set "converted_timestamp = 0", it might need to hard code
> "etm->has_virtual_ts" to false?
> >> I'm not sure why this would result in no samples being generated either,
>> because we don't actually use the timestamps for anything yet [1]. We
>> always wait until the very end of the file before decoding to ensure
>> that all of the mmaps are loaded. And the timestamp is just assigned to
>> the samples, but they shouldn't affect whether they are generated or not.
>>
>> Unless there is something else I'm missing?
> 
> Let's review below code.
> 
> cs_etm__queue_first_cs_timestamp() retrieves trace data and decodes it,
> and breaks the while loop until it find the timestamp is not zero or no
> trace data is avaliable.  When the timestamp is always zero, the while
> loop continues to drop the CoreSight trace data and don't synthesize
> samples.
> 
>   cs_etm__queue_first_cs_timestamp()
>   {
>       ...
> 
>       while(1) {
>           ret = cs_etm__get_data_block(etmq);
>           if (ret <= 0)
>               goto out;
> 
>           ret = cs_etm__decode_data_block(etmq);
>           if (ret)
>               goto out;
> 
>           cs_timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id);
>           /* We found a timestamp, no need to continue. */
>           if (cs_timestamp)
>               break;
> 
>           cs_etm__clear_all_packet_queues(etmq);
>       }
>   }
> 

Ah, I couldn't reproduce it because I was missing your other patch to
add Coresight timestamps when Perf timestamps are requested. If I add
that patch or force Coresight timestamps then I can reproduce the issue:

   $ perf record -e cs_etm/timestamp=1/ --per-thread --timestamp -- ls

But when running perf script I get this warning:

 Zero Coresight timestamp found at Idx:39. Decoding may be improved by
 prepending 'Z' to your current --itrace arguments.

And then if I force timeless mode like it says to do I get samples:

 $ perf script --itrace=Zi1000i

I'm not sure if silently making it use the kernel timestamp for sorting
makes sense, especially when zero hardware timestamps are a bug, and we
already have a warning and a workaround for it.

Also it would mean that you are requesting timestamps on a platform that
isn't generating them. Maybe the fix could also just be to not request
timestamps, and then it uses the timeless decoding flow that also works:

   $ perf record -e cs_etm/timestamp=0/ --per-thread --timestamp -- ls

My worry is that mixing kernel and coresight timestamps for sorting
could make the code confusing to reason about, and sorting based on a
single static AUX timestamp doesn't make sense logically (it's almost
like a hack where the side effect is to make it work). And there isn't
really anything that there isn't already a workaround for. Unless you
want to explicitly disable the "use the Z option" warning and make it
work transparently? Which I'm not sure is the right thing to do.


>> Also, in cs_etm__queue_first_cs_timestamp(), cs_timestamp is used for
>> sorting the decoding order between CPUs, but if the hardware timestamp
>> is 0, then it's 0 on all trace.
> 
> Correct.
> 
>> So the sorting would be the same if you change that to be the kernel
>> timestamp. They're all still the same
>> static number, but just a different number (because we wait until the
>> end of the file, 'latest_kernel_timestamp' is always the timestamp of
>> the last AUX event in the file).
> 
> If we use the 'latest_kernel_timestamp' as timestamp, it's non-zero
> timestamp rather than all timestamp '0'.  Yes, 'latest_kernel_timestamp'
> is a coarse kernel timestamp which is shared by all trace data recorded
> in the AUX event, though it's a static number, it can allow us to break
> the while loop mentioned above.
> 
> I understand 'latest_kernel_timestamp' is inaccurate for sorting, but
> as least now it exists in current code, quotes from util/cs-etm.c:
> 
>   /*
>    * Record the latest kernel timestamp available in the header   
>    * for samples so that synthesised samples occur from this point
>    * onwards.
>    */
>   if (sample->time && (sample->time != (u64)-1))
>         etm->latest_kernel_timestamp = sample->time;
> 
>> [1]: I still plan to change the decoding to decode up to the current
>> time in the file, rather than waiting for the end of the file before
>> starting. That way decoding trace when there were overlapping mmap
>> regions in time will be more accurate.
> 
> Thanks for heading up.  I am not sure if I understand this correctly,
> but seems to me it is a good thing to try for using overlapping mmap
> events.
> 
> Just a side topic, if an Arm platform connect the Arm timer counter to
> Arm CoreSight, and detect virtual timestamp is false (thus
> etm->has_virtual_ts is 0).  I think this might happen on some legacy
> platforms, can we use some ways to allow users to still use the Arm
> CoreSight timestamp in this case?  E.g. we can force set
> etm->has_virtual_ts to 1 when user specify the config
> '-e cs_etm/timestamp/'.
> 
> Thanks,
> Leo



More information about the linux-arm-kernel mailing list