[PATCH v4 0/2] Make sysFS functional on topologies with per core sink
Linu Cherian
linuc.decode at gmail.com
Thu Nov 12 03:57:52 EST 2020
Hi Suzuki,
On Tue, Nov 10, 2020 at 8:27 PM Suzuki K Poulose <suzuki.poulose at arm.com> wrote:
>
> Hi Linu
>
> On 11/10/20 12:57 PM, Linu Cherian wrote:
> > Hi Suzuki,
> >
> ...
>
> >> We are facing some issues while trying out perf. This doesn't appear
> >> to be related to your patch though. Will share the details once we
> >> do some initial analysis on it.
> >>
> >> Thanks.
> >
> > # ./perf record -vvv -e cs_etm// --per-thread uname -a
> > Using CPUID 0x00000000430f0b40
> > Attempting to add event pmu 'cs_etm' with '' that may result in non-fatal errors
> > nr_cblocks: 0
> > affinity: SYS
> > mmap flush: 1
> > comp level: 0
> > maps__set_modules_path_dir: cannot open
> > /lib/modules/5.9.0-rc5-00116-g91c9ea890e1a dir
> > Problems setting modules path maps, continuing anyway...
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 8
> > size 120
> > { sample_period, sample_freq } 1
> > sample_type IP|TID|IDENTIFIER
> > read_format ID
> > disabled 1
> > enable_on_exec 1
> > sample_id_all 1
> > exclude_guest 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 3333 cpu -1 group_fd -1 flags 0x8 = 5
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 120
> > config 0x9
> > { sample_period, sample_freq } 1
> > sample_type IP|TID|IDENTIFIER
> > read_format ID
> > disabled 1
> > exclude_kernel 1
> > exclude_hv 1
> > mmap 1
> > comm 1
> > enable_on_exec 1
> > task 1
> > sample_id_all 1
> > exclude_guest 1
> > mmap2 1
> > comm_exec 1
> > context_switch 1
> > ksymbol 1
> > bpf_event 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 3333 cpu -1 group_fd -1 flags 0x8 = 6
> > mmap size 589824B
> > AUX area mmap length 4194304
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 120
> > config 0x9
> > watermark 1
> > sample_id_all 1
> > bpf_event 1
> > { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu -1 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off bpf_event
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 120
> > config 0x9
> > watermark 1
> > sample_id_all 1
> > { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu -1 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off cloexec flag
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 120
> > config 0x9
> > watermark 1
> > sample_id_all 1
> > { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu -1 group_fd -1 flags 0
> > sys_perf_event_open failed, error -22
> > switching off sample_id_all
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 120
> > config 0x9
> > watermark 1
> > { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu -1 group_fd -1 flags 0
> > sys_perf_event_open failed, error -22
> > Couldn't start the BPF side band thread:
> > BPF programs starting from now on won't be annotatable
> > Synthesizing auxtrace information
> > cannot find cgroup mount point
> > Couldn't synthesize cgroup events.
> > Control descriptor is not initialized
> > Linux marvell 5.9.0-rc5-00116-g91c9ea890e1a #823 SMP PREEMPT Tue Nov
> > 10 10:49:15 IST 2020 aarch64 aarch64 aarch64 GNU/Linux
>
> > auxtrace idx 0 old 0 head 0xdd50 diff 0xdd50
>
> I haven't seen this in the normal verbose output.
>
> > [ perf record: Woken up 1 times to write data ]
> > symbol:init_start file:(null) line:0 offset:0 return:0 lazy:(null)
> > snip ..
> > symbol:memory_mallopt file:(null) line:0 offset:0 return:0 lazy:(null)
> > failed to write feature CPUDESC
> > failed to write feature MEM_TOPOLOGY
> > failed to write feature CPU_PMU_CAPS
> > [ perf record: Captured and wrote 0.056 MB perf.data ]
> >
> > # ./perf report
> > 0x368 [0x50]: failed to process type: 1 [Cannot allocate memory]
> > Error:
> > failed to process sample
>
> I have no clue about it. Are you able to run it under GDB ? (Looks like
> you have built the perf, so if you have sources, it may be a good idea
> to run under the GDB and figure out where that error is coming from).
>
Yeah gdb helped figuring out the issue.
The issue is in the opencsd, where it doesn't seem to support multiple streams
when the formatter is not enabled. .
Note:Our Silicon has formatter disabled and we already had changes in perf tool
to take care of the formatter status.
The below hack helped.
diff --git a/decoder/source/ocsd_dcd_tree.cpp b/decoder/source/ocsd_dcd_tree.cpp
index be15e36..0210dec 100644
--- a/decoder/source/ocsd_dcd_tree.cpp
+++ b/decoder/source/ocsd_dcd_tree.cpp
@@ -401,7 +401,7 @@ ocsd_err_t DecodeTree::createDecoder(const
std::string &decoderName, const int c
int crtFlags = createFlags;
uint8_t CSID = 0; // default for single stream decoder (no
deformatter) - we ignore the ID
- if(usingFormatter())
+ //if(usingFormatter())
{
CSID = pConfig->getTraceID();
crtFlags |= OCSD_CREATE_FLG_INST_ID;
Not sure if this is the right fix though.
This is how i tested,
1. # taskset 0x2 ./perf record -e cs_etm//u -F 10 --per-thread ping -c
30 127.0.0.1
2. # Ctrl-Z // Put the process in background
3. # taskset -p 0x4 <pid of ping process> // Move the ping process to core 2
4. # fg // Get the process to foreground
5. ./perf report
snip ...
# Samples: 66K of event 'branches:uH'
# Event count (approx.): 66953
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .....................
........................................
#
15.94% 15.94% ping ld-2.31.so [.] _dl_lookup_symbol_x
14.93% 14.93% ping ld-2.31.so [.] do_lookup_x
10.68% 10.68% ping libc-2.31.so [.] _dl_addr
9.87% 9.87% ping ld-2.31.so [.] _dl_relocate_object
6.75% 6.75% ping ld-2.31.so [.] strcmp
3.62% 3.62% ping ld-2.31.so [.] check_match
2.72% 2.72% ping libc-2.31.so [.] __vfprintf_internal
1.90% 1.90% ping libc-2.31.so [.] _int_malloc
1.29% 1.29% ping libc-2.31.so [.] getenv
1.28% 1.28% ping libc-2.31.so [.] strcmp
1.17% 1.17% ping libc-2.31.so [.]
_IO_file_xsputn@@GLIBC_2.17
1.16% 1.16% ping ld-2.31.so [.] _dl_name_match_p
snip ...
Also i could verify using prints in the tmc-etr-driver that the trace
buffer gets reused across cores
as well.
> Also, what is
>
> perf --version ?
perf version 5.9.0-rc5
>
>
> > # To display the perf.data header info, please use --header/--header-only option
> >
> > ============================================================================
> >
> > Appreciate your help on getting some debug hints on what is going wrong.
>
>
> >
> > One strange thing noted here is sys_perf_event_open, passing cpu = -1
> > and pid = -1,
> > which doesnt appear to be valid as per tools/perf/design.txt
>
> I see that on my Juno, but it still works. I believe that is for the
> generic PMU (pmu.type == 1) and not the coresight PMU, which I believe
> is (type == 8) in your case (the first event).
>
> Suzuki
More information about the linux-arm-kernel
mailing list