[PATCH v1 0/4] coresight: ctcu: Enable byte-cntr function for TMC ETR

Wed Mar 12 23:15:42 PDT 2025

On 3/12/2025 9:22 PM, Mike Leach wrote:
> Hi,
> 
> On Mon, 10 Mar 2025 at 09:05, Jie Gan <quic_jiegan at quicinc.com> wrote:
>>
>> From: Jie Gan <jie.gan at oss.qualcomm.com>
>>
>> The byte-cntr function provided by the CTCU device is used to transfer data
>> from the ETR buffer to the userspace. An interrupt is tiggered if the data
>> size exceeds the threshold set in the BYTECNTRVAL register. The interrupt
>> handler counts the number of triggered interruptions and the read function
>> will read the data from the ETR buffer if the IRQ count is greater than 0.
>> Each successful read process will decrement the IRQ count by 1.
>>
>> The byte cntr function will start when the device node is opened for reading,
>> and the IRQ count will reset when the byte cntr function has stopped. When
>> the file node is opened, the w_offset of the ETR buffer will be read and
>> stored in byte_cntr_data, serving as the original r_offset (indicating
>> where reading starts) for the byte counter function.
>>
>> The work queue for the read operation will wake up once when ETR is stopped,
>> ensuring that the remaining data in the ETR buffer has been flushed based on
>> the w_offset read at the time of stopping.
>>
>> The following shell commands write threshold to BYTECNTRVAL registers.
>>
>> Only enable byte-cntr for ETR0:
>> echo 0x10000 > /sys/devices/platform/soc at 0/4001000.ctcu/ctcu0/byte_cntr_val
>>
>> Enable byte-cntr for both ETR0 and ETR1(support both hex and decimal values):
>> echo 0x10000 4096 > /sys/devices/platform/soc at 0/4001000.ctcu/ctcu0/byte_cntr_val
>>
>> Setting the BYTECNTRVAL registers to 0 disables the byte-cntr function.
>> Disable byte-cntr for ETR0:
>> echo 0 > /sys/devices/platform/soc at 0/4001000.ctcu/ctcu0/byte_cntr_val
>>
>> Disable byte-cntr for both ETR0 and ETR1:
>> echo 0 0 > /sys/devices/platform/soc at 0/4001000.ctcu/ctcu0/byte_cntr_val
>>
>> There is a minimum threshold to prevent generating too many interrupts.
>> The minimum threshold is 4096 bytes. The write process will fail if user try
>> to set the BYTECNTRVAL registers to a value less than 4096 bytes(except
>> for 0).
>>
>> Finally, the user can read data from the ETR buffer through the byte-cntr file
>> nodes located under /dev, for example reads data from the ETR0 buffer:
>> cat /dev/byte-cntr0
>>
>> Way to enable and start byte-cntr for ETR0:
>> echo 0x10000 > /sys/devices/platform/soc at 0/4001000.ctcu/ctcu0/byte_cntr_val
>> echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
>> echo 1 > /sys/bus/coresight/devices/etm0/enable_source
>> cat /dev/byte-cntr0
>>
> 
> There is a significant issue with attempting to drain an ETR buffer
> while it is live in the way you appear to be doing.
> 
> You have no way of knowing if the TMC hardware write pointer wraps and
> overtakes the point where you are currently reading. This could cause
> data corruption as TMC writes as you are reading, or contention for
> the buffer that affects the TMC write.
> 
> Even if those two events do not occur, then the trace capture sequence
> is corrupted.
> 
> Take a simple example - suppose we split the buffer into 4 blocks of
> trace, which are filled by the ETR
> 
> buffer = 1, 2, 3, 4
> 
> Now you suppose you have read 1 & 2 into your userspace buffer / file.
> 
> file = 1, 2
> 
> If there is now some system event that prevents your userspace code
> from running for a while, then it is possible that the ETR continues,
> wraps and the buffer is now
> 
> buffer = 5, 6, 7, 4
> 
> Your next two reads will be 7, 4
> 
> file = 1, 2, 7, 4
> 
> This trace is now corrupt and will cause decode errors. There is no
> way for the decoder to determine that the interface between blocks 2 &
> 7 is not correct. If you are fortunate then this issue will cause an
> actual explicit decode error, if you are less fortunate then decode
> will continue but in fact be inaccurate, with no obvious way to detect
> the inaccuracy.
> 
> We encountered this problem early in the development of the perf data
> collection. Even though perf was stopping the trace to copy the
> hardware buffer, it would concatenate unrelated trace blocks into the
> perf userspace buffer, which initially caused decoding errors. This is
> now mitigated in perf by marking boundaries and recording indexes of
> the boundaries, so the tool can reset the decoder at the start of non
> contiguous blocks.
> 
> If you do not stop the TMC when draining the ETR buffer, you have no
> way of determining if this has occurred.
> 
> Clearly using large buffers, split into smaller blocks can mitigate
> the possibility of a wrap in this way - but never eliminate it,
> especially given the extreme rate that trace data can be generated.
> 

Hi Mike,

Thanks for detailed explanation. It's clear and makes sense to me.

I will look for another reasonable solution.

Thanks,
Jie

> Regards
> 
> Mike
> 
> 
>> Jie Gan (4):
>>    coresight: tmc: Introduce new APIs to get the RWP offset of ETR buffer
>>    dt-bindings: arm: Add an interrupt property for Coresight CTCU
>>    coresight: ctcu: Enable byte-cntr for TMC ETR devices
>>    arm64: dts: qcom: sa8775p: Add interrupts to CTCU device
>>
>>   .../bindings/arm/qcom,coresight-ctcu.yaml     |  17 +
>>   arch/arm64/boot/dts/qcom/sa8775p.dtsi         |   5 +
>>   drivers/hwtracing/coresight/Makefile          |   2 +-
>>   .../coresight/coresight-ctcu-byte-cntr.c      | 339 ++++++++++++++++++
>>   .../hwtracing/coresight/coresight-ctcu-core.c |  96 ++++-
>>   drivers/hwtracing/coresight/coresight-ctcu.h  |  59 ++-
>>   .../hwtracing/coresight/coresight-tmc-etr.c   |  45 ++-
>>   drivers/hwtracing/coresight/coresight-tmc.h   |   3 +
>>   8 files changed, 556 insertions(+), 10 deletions(-)
>>   create mode 100644 drivers/hwtracing/coresight/coresight-ctcu-byte-cntr.c
>>
>> --
>> 2.34.1
>>
> 
>