[V14 3/8] drivers: perf: arm_pmuv3: Enable branch stack sampling framework

Anshuman Khandual anshuman.khandual at arm.com
Mon Nov 27 00:06:06 PST 2023



On 11/23/23 18:05, James Clark wrote:
> 
> 
> On 21/11/2023 09:57, Anshuman Khandual wrote:
>>
>>
>> On 11/15/23 15:37, James Clark wrote:
>>>
>>>
>>> On 15/11/2023 07:22, Anshuman Khandual wrote:
>>>> On 11/14/23 17:44, James Clark wrote:
>>>>>
>>>>>
>>>>> On 14/11/2023 05:13, Anshuman Khandual wrote:
>>>>> [...]
>>>>>
>>>>>> +/*
>>>>>> + * This is a read only constant and safe during multi threaded access
>>>>>> + */
>>>>>> +static struct perf_branch_stack zero_branch_stack = { .nr = 0, .hw_idx = -1ULL};
>>>>>> +
>>>>>> +static void read_branch_records(struct pmu_hw_events *cpuc,
>>>>>> +				struct perf_event *event,
>>>>>> +				struct perf_sample_data *data,
>>>>>> +				bool *branch_captured)
>>>>>> +{
>>>>>> +	/*
>>>>>> +	 * CPU specific branch records buffer must have been allocated already
>>>>>> +	 * for the hardware records to be captured and processed further.
>>>>>> +	 */
>>>>>> +	if (WARN_ON(!cpuc->branches))
>>>>>> +		return;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Overflowed event's branch_sample_type does not match the configured
>>>>>> +	 * branch filters in the BRBE HW. So the captured branch records here
>>>>>> +	 * cannot be co-related to the overflowed event. Report to the user as
>>>>>> +	 * if no branch records have been captured, and flush branch records.
>>>>>> +	 * The same scenario is applicable when the current task context does
>>>>>> +	 * not match with overflown event.
>>>>>> +	 */
>>>>>> +	if ((cpuc->brbe_sample_type != event->attr.branch_sample_type) ||
>>>>>> +	    (event->ctx->task && cpuc->brbe_context != event->ctx)) {
>>>>>> +		perf_sample_save_brstack(data, event, &zero_branch_stack);
>>>>>
>>>>> Is there any benefit to outputting a zero size stack vs not outputting
>>>>> anything at all?
>>>>
>>>> The event has got PERF_SAMPLE_BRANCH_STACK marked and hence perf_sample_data
>>>> must have PERF_SAMPLE_BRANCH_STACK with it's br_stack pointing to the branch
>>>> records. Hence without assigning a zeroed struct perf_branch_stack, there is
>>>> a chance, that perf_sample_data will pass on some garbage branch records to
>>>> the ring buffer.
>>>>
>>>
>>> I don't think that's an issue, the perf core code handles the case where
>>> no branch stack exists on a sample. It even outputs the zero length for
>>> you, but there is other stuff that can be skipped if you just never call
>>> perf_sample_save_brstack():
>>
>> Sending out perf_sample_data without valid data->br_stack seems problematic,
>> which would be the case when perf_sample_save_brstack() never gets called on
>> the perf_sample_data being prepared, and depend on the below 'else' case for
>> pushing out zero records.
>>
> 
> I'm not following why it would be problematic. data->br_stack is
> initialised to NULL in perf_prepare_sample() and the core code
> specifically has a path that was added for the case where
> perf_sample_save_brstack() was never called.

Without perf_sample_save_brstack() called on the perf sample data will
preserve 'data->br_stack' unchanged as NULL from perf_prepare_sample(),
The perf sample record, will eventually be skipped for 'data->br_stack'
element in perf_output_sample().

void perf_prepare_sample(struct perf_sample_data *data,
                         struct perf_event *event,
                         struct pt_regs *regs)
{
	....
        if (filtered_sample_type & PERF_SAMPLE_BRANCH_STACK) {
                data->br_stack = NULL;
                data->dyn_size += sizeof(u64);
                data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
        }
	....
}

void perf_output_sample(struct perf_output_handle *handle,
                        struct perf_event_header *header,
                        struct perf_sample_data *data,
                        struct perf_event *event)
{
	....
        if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
                if (data->br_stack) {
                        size_t size;

                        size = data->br_stack->nr
                             * sizeof(struct perf_branch_entry);

                        perf_output_put(handle, data->br_stack->nr);
                        if (branch_sample_hw_index(event))
                                perf_output_put(handle, data->br_stack->hw_idx);
                        perf_output_copy(handle, data->br_stack->entries, size);
                } else {
                        /*
                         * we always store at least the value of nr
                         */
                        u64 nr = 0;
                        perf_output_put(handle, nr);
                }
        }
	....
}



More information about the linux-arm-kernel mailing list