[PATCH v4 0/2] Make sysFS functional on topologies with per core sink
Mathieu Poirier
mathieu.poirier at linaro.org
Tue Oct 6 12:56:28 EDT 2020
On Tue, 6 Oct 2020 at 10:38, Suzuki K Poulose <suzuki.poulose at arm.com> wrote:
>
> On 10/06/2020 05:12 PM, Mathieu Poirier wrote:
> > On Mon, Oct 05, 2020 at 12:27:07PM +0100, Suzuki K Poulose wrote:
> >> Hi Linu,
> >>
> >> On 09/04/2020 03:41 AM, Linu Cherian wrote:
> >>> This patch series tries to fix the sysfs breakage on topologies
> >>> with per core sink.
> >>>
> >>> Changes since v3:
> >>> - References to coresight_get_enabled_sink in perf interface
> >>> has been removed and marked deprecated as a new patch.
> >>> - To avoid changes to coresight_find_sink for ease of maintenance,
> >>> search function specific to sysfs usage has been added.
> >>> - Sysfs being the only user for coresight_get_enabled sink,
> >>> reset option is removed as well.
> >>
> >> Have you tried running perf with --per-thread option ? I believe
> >> this will be impacted as well, as we choose a single sink at the
> >> moment and this may not be reachable from the other CPUs, where
> >> the event may be scheduled. Eventually loosing trace for the
> >> duration where the task is scheduled on a different CPU.
> >
> > Right, I considered this set in the context of sysfs only. I expect supporting
> > 1:1 configuration on perf to require changes in the kernel drivers and the perf
> > tools.
>
> Correct. We can only support the perf case where the user opts for automatic
> sink selection. If the user explicitly specifies a sink this could be either :
>
> 1) Due to the ignorance of the topology.
> OR
> 2) Purposefully forcing to use a global ETR on the system (for various
> different reasons. e.g, debugging the interaction of multiple CPUs.)
>
I agree.
> So, the kernel cannot correct the user by choosing a different sink, when
> something is specified. That is something the proposed patch below does.
> As for the perf tools, as long as the tool can decode multiple AUX records
> from different ETMs (generated on different sinks), I think we should be
> done.
>
IntelPT uses a 1:1 topology and as such Alex designed the perf tools
AUX mechanic to handle that kind of design. In that regard the heavy
lifting has already been done but it won't work right away for CS,
some changes will be needed to tell the platform dependent code what
to expect. The decoding part _should_ be fine but a careful
examination the code is the only way to be certain of that.
Thanks,
Mathieu
> Suzuki
>
>
> >
> >>
> >> Please could you try this patch and see if helps ? I have lightly
> >> tested this on a fast model.
> >>
> >> ---8>---
> >>
> >> coresight: etm-perf: Allow an event to use multiple sinks
> >>
> >> When there are multiple sinks on the system, in the absence
> >> of a specified sink, it is quite possible that a default sink
> >> for an ETM could be different from that of another ETM (e.g, on
> >> systems with per-CPU sinks). However we do not support having
> >> multiple sinks for an event yet. This patch allows the event to
> >> use the default sinks on the ETMs where they are scheduled as
> >> long as the sinks are of the same type.
> >>
> >> e.g, if we have 1x1 topology with per-CPU ETRs, the event can
> >> use the per-CPU ETR for the session. However, if the sinks
> >> are of different type, e.g TMC-ETR on one and a custom sink
> >> on another, the event will only trace on the first detected
> >> sink (just like we have today).
> >>
> >> Cc: Linu Cherian <lcherian at marvell.com>
> >> Cc: Mathieu Poirier <mathieu.poirier at linaro.org>
> >> Cc: Mike Leach <mike.leach at linaro.org>
> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose at arm.com>
> >> ---
> >> .../hwtracing/coresight/coresight-etm-perf.c | 69 +++++++++++++------
> >> 1 file changed, 49 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c
> >> b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >> index c2c9b127d074..19fe38010474 100644
> >> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> >> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> >> @@ -204,14 +204,28 @@ static void etm_free_aux(void *data)
> >> schedule_work(&event_data->work);
> >> }
> >>
> >> +/*
> >> + * When an event could be scheduled on more than one CPUs, we have to make
> >> + * sure that the sinks are of the same type, so that the sink_buffer could
> >> + * be reused.
> >> + */
> >> +static bool sinks_match(struct coresight_device *a, struct coresight_device *b)
> >> +{
> >> + if (!a || !b)
> >> + return false;
> >> + return (sink_ops(a) == sink_ops(b)) &&
> >> + (a->subtype.sink_subtype == b->subtype.sink_subtype);
> >> +}
> >> +
> >> static void *etm_setup_aux(struct perf_event *event, void **pages,
> >> int nr_pages, bool overwrite)
> >> {
> >> u32 id;
> >> int cpu = event->cpu;
> >> cpumask_t *mask;
> >> - struct coresight_device *sink;
> >> + struct coresight_device *sink = NULL;
> >> struct etm_event_data *event_data = NULL;
> >> + bool sink_forced = false;
> >>
> >> event_data = alloc_event_data(cpu);
> >> if (!event_data)
> >> @@ -222,6 +236,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >> if (event->attr.config2) {
> >> id = (u32)event->attr.config2;
> >> sink = coresight_get_sink_by_id(id);
> >> + sink_forced = true;
> >> }
> >>
> >> mask = &event_data->mask;
> >> @@ -235,7 +250,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >> */
> >> for_each_cpu(cpu, mask) {
> >> struct list_head *path;
> >> - struct coresight_device *csdev;
> >> + struct coresight_device *csdev, *cpu_sink;
> >>
> >> csdev = per_cpu(csdev_src, cpu);
> >> /*
> >> @@ -243,33 +258,42 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >> * the mask and continue with the rest. If ever we try to trace
> >> * on this CPU, we handle it accordingly.
> >> */
> >> - if (!csdev) {
> >> - cpumask_clear_cpu(cpu, mask);
> >> - continue;
> >> - }
> >> -
> >> + if (!csdev)
> >> + goto clear_cpu;
> >> /*
> >> - * No sink provided - look for a default sink for one of the
> >> - * devices. At present we only support topology where all CPUs
> >> - * use the same sink [N:1], so only need to find one sink. The
> >> - * coresight_build_path later will remove any CPU that does not
> >> - * attach to the sink, or if we have not found a sink.
> >> + * No sink provided - look for a default sink for all the devices.
> >> + * We only support multiple sinks, only if all the default sinks
> >> + * are of the same type, so that the sink buffer can be shared
> >> + * as the event moves around. As earlier, we don't trace on a
> >> + * CPU, if we can't find a suitable sink.
> >> */
> >> - if (!sink)
> >> - sink = coresight_find_default_sink(csdev);
> >> + if (!sink_forced) {
> >> + cpu_sink = coresight_find_default_sink(csdev);
> >> + if (!cpu_sink)
> >> + goto clear_cpu;
> >> + /* First sink for this event */
> >> + if (!sink) {
> >> + sink = cpu_sink;
> >> + } else if (!sinks_match(cpu_sink, sink)) {
> >> + goto clear_cpu;
> >> + }
> >> +
> >> + } else {
> >> + cpu_sink = sink;
> >> + }
> >>
> >> /*
> >> * Building a path doesn't enable it, it simply builds a
> >> * list of devices from source to sink that can be
> >> * referenced later when the path is actually needed.
> >> */
> >> - path = coresight_build_path(csdev, sink);
> >> - if (IS_ERR(path)) {
> >> - cpumask_clear_cpu(cpu, mask);
> >> + path = coresight_build_path(csdev, cpu_sink);
> >> + if (!IS_ERR(path)) {
> >> + *etm_event_cpu_path_ptr(event_data, cpu) = path;
> >> continue;
> >> }
> >> -
> >> - *etm_event_cpu_path_ptr(event_data, cpu) = path;
> >> +clear_cpu:
> >> + cpumask_clear_cpu(cpu, mask);
> >> }
> >>
> >> /* no sink found for any CPU - cannot trace */
> >> @@ -284,7 +308,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
> >> if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
> >> goto err;
> >>
> >> - /* Allocate the sink buffer for this session */
> >> + /*
> >> + * Allocate the sink buffer for this session. All the sinks
> >> + * where this event can be scheduled are ensured to be of the
> >> + * same type. Thus the same sink configuration is used by the
> >> + * sinks.
> >> + */
> >> event_data->snk_config =
> >> sink_ops(sink)->alloc_buffer(sink, event, pages,
> >> nr_pages, overwrite);
> >> --
> >> 2.24.1
> >>
>
More information about the linux-arm-kernel
mailing list