[PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
Suzuki K Poulose
suzuki.poulose at arm.com
Tue Apr 27 11:00:51 BST 2021
On 27/04/2021 04:45, Leo Yan wrote:
> On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote:
>
> [...]
>
>>> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>> {
>>> long bytes;
>>> long pg_idx, pg_offset;
>>> - unsigned long head = etr_perf->head;
>>> + unsigned long head;
>>> char **dst_pages, *src_buf;
>>> struct etr_buf *etr_buf = etr_perf->etr_buf;
>>> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>> bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
>>> &src_buf);
>>> if (WARN_ON_ONCE(bytes <= 0))
>>> - break;
>>> + return;
>>> bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
>>> memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
>>> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>> /* Move source pointers */
>>> src_offset += bytes;
>>> }
>>> + etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
>>
>>
>> Looking at this patch, I feel the driver is doing a couple wrong things
>> already.
>>
>> 1) We initialise etr_perf->head every time the ETR enable is called,
>> irrespective of whether we actually try to enable the Hardware. e.g,
>>
>> etm_0 on -> .. -> enable_etr :
>> etr_perf->head = <head of the handle_0>
>> enable_hw()
>>
>> emt_1 on -> ... -> enable_etr:
>> etr_perf->head = <head of the handle_1>
>> already_enabled, skip enable_hw()
>>
>> etm_2 on -> ... -> enable_etr:
>> etr_perf->head = <head of the handle_2>
>> already_enable, skip enable_hw()...
>>
>>
>> This doesn't look correct as we don't know which handle is going to get the
>> data. This looks pointless.
>
> I'd like to convert mapping into below diagram (for system wide trace):
>
> CPU0: AUX RB (perf_output_handle_0) -> etr_perf -> +---------+
> CPU1: AUX RB (perf_output_handle_1) -> etr_perf -> | etr_buf |
> CPU2: AUX RB (perf_output_handle_2) -> etr_perf -> | |
> CPU3: AUX RB (perf_output_handle_3) -> etr_perf -> +---------+ >
To make it more clear:
CPU0: AUX RB (perf_output_handle_0) -> etr_perf0 -> +---------+
CPU1: AUX RB (perf_output_handle_1) -> etr_perf1 -> |etr_buf0 |
CPU2: AUX RB (perf_output_handle_2) -> etr_perf2 -> | |
CPU3: AUX RB (perf_output_handle_3) -> etr_perf3 -> +---------+
> Simply to say, there have two layers for controlling ring buffer, one
> layer is for perf AUX ring buffer, it mainly uses the structure
> perf_output_handle to manage the ring buffer. And in the ETR driver,
> it uses structure etr_perf to manage the header pointer for copying
> data into ETR buffer (tagged as "etr_buf").
>
> ETR buffer is the single one, but the structures "perf_output_handle"
> and "etr_perf" are per CPU. We have multiple copies for the headers and
minor Correction, they are "per-event" to be precise. And there are
events per-CPU in a system wide mode or task mode (but not per-thread
mode). So, you are correct
> tails to manage a single buffer, but the problem is these multiple
> copies have not been synced with each other.
>
>> 2) Even more problematic is where we copy the AUX buffer content to.
>> As mentioned above, we don't know which handle is going to be the last
>> one to consume and we have a "etr_perf->head" that came from one of the
>> handles and the "pages" that came from the first handle which created a
>> etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
>> the "pages" (say of handle_0) with "etr_perf->head" (which could be from
>> any other handle, say handle_2) and then we could return the number of bytes
>> copied, which then is used to update the last handle (could be say
>> handle_3), where there is no actual data copied.
This is not valid and am relieved that the driver is correct. The
assumption that there is only one etr_perf per ETR is incorrect as
pictured above.
>>
>> To fix all of these issues, we must
>> 1) Stop using etr_perf->head, and instead use the handle->head where we are
>> called update_buffer on.
>>
>> 2) Keep track of the "pages" that belong to a given "handle" and then use
>> those pages to copy the data to the current handle we are called to update
>> the buffer on.
>
> The "pages" are only allocated once, even they are attached to multiple
> handles. I think the right way is to use the single structure
I assume you mean the pages in the etr_buf and not etr_perf right ?
> "etr_perf" and single "perf_output_handle" to manage the "pages", IOW,
> if there have single buffer, then we just use one copy of header and
> tail to manage it.
I think this is not needed and the way we do things are fine and the
patch as such looks correct to me.
The perf_output_handle is per-event and nothing that we can combine
with. etr_perf captures what the "ouput_handle" stands for and is
something necessary for syncing the buffer.
Now coming back to this patch, I understand that the sync_perf could be
called with the polling patches multiple times. But don't we do a
perf_output_handle_end() each of the time we wake up ? (I haven't looked
at the later patches yet).
I would expect:
perf_aux_output_begin() -> update the etr_perf-> head
when we sync the buffer, we do :
Poll-> sync_buffer-> perf_aux_output_end() and perf_aux_output_begin()
-> update etr_perf->head.
Kind regards
Suzuki
More information about the linux-arm-kernel
mailing list