[PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.

Tue Apr 27 11:00:51 BST 2021

On 27/04/2021 04:45, Leo Yan wrote:
> On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote:
> 
> [...]
> 
>>> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    {
>>>    	long bytes;
>>>    	long pg_idx, pg_offset;
>>> -	unsigned long head = etr_perf->head;
>>> +	unsigned long head;
>>>    	char **dst_pages, *src_buf;
>>>    	struct etr_buf *etr_buf = etr_perf->etr_buf;
>>> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
>>>    					     &src_buf);
>>>    		if (WARN_ON_ONCE(bytes <= 0))
>>> -			break;
>>> +			return;
>>>    		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
>>>    		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
>>> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    		/* Move source pointers */
>>>    		src_offset += bytes;
>>>    	}
>>> +	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
>>
>>
>> Looking at this patch, I feel the driver is doing a couple wrong things
>> already.
>>
>> 1) We initialise etr_perf->head every time the ETR enable is called,
>> irrespective of whether we actually try to enable the Hardware. e.g,
>>
>> etm_0 on -> .. -> enable_etr :
>> etr_perf->head = <head of the handle_0>
>>    enable_hw()
>>
>> emt_1 on -> ... -> enable_etr:
>>    etr_perf->head = <head of the handle_1>
>>    already_enabled, skip enable_hw()
>>
>> etm_2 on -> ... -> enable_etr:
>>    etr_perf->head = <head of the handle_2>
>>    already_enable, skip enable_hw()...
>>
>>
>> This doesn't look correct as we don't know which handle is going to get the
>> data. This looks pointless.
> 
> I'd like to convert mapping into below diagram (for system wide trace):
> 
>    CPU0: AUX RB (perf_output_handle_0) -> etr_perf ->  +---------+
>    CPU1: AUX RB (perf_output_handle_1) -> etr_perf ->  | etr_buf |
>    CPU2: AUX RB (perf_output_handle_2) -> etr_perf ->  |         |
>    CPU3: AUX RB (perf_output_handle_3) -> etr_perf ->  +---------+ >

To make it more clear:

     CPU0: AUX RB (perf_output_handle_0) -> etr_perf0 ->  +---------+
     CPU1: AUX RB (perf_output_handle_1) -> etr_perf1 ->  |etr_buf0 |
     CPU2: AUX RB (perf_output_handle_2) -> etr_perf2 ->  |         |
     CPU3: AUX RB (perf_output_handle_3) -> etr_perf3 ->  +---------+

> Simply to say, there have two layers for controlling ring buffer, one
> layer is for perf AUX ring buffer, it mainly uses the structure
> perf_output_handle to manage the ring buffer.  And in the ETR driver,
> it uses structure etr_perf to manage the header pointer for copying
> data into ETR buffer (tagged as "etr_buf").
> 
> ETR buffer is the single one, but the structures "perf_output_handle"
> and "etr_perf" are per CPU.  We have multiple copies for the headers and

minor Correction, they are "per-event" to be precise. And there are 
events per-CPU in a system wide mode or task mode (but not per-thread 
mode). So, you are correct

> tails to manage a single buffer, but the problem is these multiple
> copies have not been synced with each other.
> 
>> 2) Even more problematic is where we copy the AUX buffer content to.
>> As mentioned above, we don't know which handle is going to be the last
>> one to consume and we have a "etr_perf->head" that came from one of the
>> handles and the "pages" that came from the first handle which created a
>> etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
>> the "pages" (say of handle_0) with "etr_perf->head" (which could be from
>> any other handle, say handle_2) and then we could return the number of bytes
>> copied, which then is used to update the last handle (could be say
>> handle_3), where there is no actual data copied.

This is not valid and am relieved that the driver is correct. The 
assumption that there is only one etr_perf per ETR is incorrect as
pictured above.

>>
>> To fix all of these issues, we must
>> 1) Stop using etr_perf->head, and instead use the handle->head where we are
>> called update_buffer on.
>>
>> 2) Keep track of the "pages" that belong to a given "handle" and then use
>> those pages to copy the data to the current handle we are called to update
>> the buffer on.
> 
> The "pages" are only allocated once, even they are attached to multiple
> handles.  I think the right way is to use the single structure

I assume you mean the pages in the etr_buf and not etr_perf right ?

> "etr_perf" and single "perf_output_handle" to manage the "pages", IOW,
> if there have single buffer, then we just use one copy of header and
> tail to manage it.

I think this is not needed and the way we do things are fine and the 
patch as such looks correct to me.

The perf_output_handle is per-event and nothing that we can combine 
with. etr_perf captures what the "ouput_handle" stands for and is 
something necessary for syncing the buffer.

Now coming back to this patch, I understand that the sync_perf could be 
called with the polling patches multiple times. But don't we do a
perf_output_handle_end() each of the time we wake up ? (I haven't looked
at the later patches yet).

I would expect:

   perf_aux_output_begin() -> update the etr_perf-> head

   when we sync the buffer, we do :

  Poll-> sync_buffer-> perf_aux_output_end() and perf_aux_output_begin() 
-> update etr_perf->head.

Kind regards
Suzuki