[PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
Ganapatrao Kulkarni
gankulkarni at os.amperecomputing.com
Tue Jul 23 23:38:37 PDT 2024
On 23-07-2024 09:16 pm, James Clark wrote:
>
>
> On 23/07/2024 4:26 pm, Ganapatrao Kulkarni wrote:
>>
>>
>> On 23-07-2024 06:40 pm, James Clark wrote:
>>>
>>>
>>> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>>>
>>>> Hi James,
>>>>
>>>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>>>
>>>>>
>>>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>>>> To generate the instruction tracing, script uses 2 contiguous packets
>>>>>> address range. If there a continuity brake due to discontiguous
>>>>>> branch
>>>>>> address, it is required to reset the tracing and start tracing
>>>>>> with the
>>>>>> new set of contiguous packets.
>>>>>>
>>>>>> Adding change to identify the break and complete the remaining
>>>>>> tracing
>>>>>> of current packets and restart tracing from new set of packets, if
>>>>>> continuity is established.
>>>>>>
>>>>>
>>>>> Hi Ganapatrao,
>>>>>
>>>>> Can you add a before and after example of what's changed to the
>>>>> commit message? It wasn't immediately obvious to me if this is
>>>>> adding missing output, or it was correcting the tail end of the
>>>>> output that was previously wrong.
>>>>
>>>> It is adding tail end of the trace as well avoiding the segfault of
>>>> the perf application. With out this change the perf segfaults with
>>>> as below log
>>>>
>>>>
>>>> ./perf script
>>>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d
>>>> objdump -k ../../vmlinux -v $* > dump
>>>> objdump: error: the stop address should be after the start address
>>>> Traceback (most recent call last):
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>>>> process_event
>>>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>>>> print_disam
>>>> for line in read_disam(dso_fname, dso_start, start_addr,
>>>> stop_addr):
>>>>
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in
>>>> read_disam
>>>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>>>> ^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 466, in
>>>> check_output
>>>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>>>> raise CalledProcessError(retcode, process.args,
>>>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>>>> '--start-address=0xffff80008125b758',
>>>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>>>> non-zero exit status 1.
>>>> Fatal Python error: handler_call_die: problem in Python trace event
>>>> handler
>>>> Python runtime state: initialized
>>>>
>>>> Current thread 0x0000ffffb05054e0 (most recent call first):
>>>> <no Python frame>
>>>>
>>>> Extension modules: perf_trace_context, systemd._journal,
>>>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>>>> problem._py3abrt (total: 7)
>>>> Aborted (core dumped)
>>>>
>>>>>
>>>>>> Signed-off-by: Ganapatrao Kulkarni
>>>>>> <gankulkarni at os.amperecomputing.com>
>>>>>> ---
>>>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>>>> 1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>>>> return
>>>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>>>> +
>>>>>
>>>>> Do you need to write into the global cpu_data here? Doesn't it get
>>>>> overwritten after you load it back into 'prev_ip'
>>>>
>>>> No, the logic is same as holding the addr of previous packet.
>>>> Saving the previous packet saved ip in to prev_ip before overwriting
>>>> with the current packet.
>>>
>>> It's not exactly the same logic as holding the addr of the previous
>>> sample. For addr, we return on the first None, with your change we
>>> now "pretend" that the second one is also the previous one:
>>>
>>> if (cpu_data.get(str(cpu) + 'addr') == None):
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> return <----------------------------sample 0 return
>>>
>>> if (cpu_data.get(str(cpu) + 'ip') == None):
>>> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>>>
>>> Then for sample 1 'prev_ip' is actually now the 'current' IP:
>>
>> Yes, it is dummy for first packet. Added anticipating that we wont hit
>> the discontinuity for the first packet itself.
>>
>> Can this be changed to more intuitive like below?
>>
>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> index d973c2baed1c..d49f5090059f 100755
>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>> @@ -198,6 +198,8 @@ def process_event(param_dict):
>> cpu_data[str(cpu) + 'addr'] = addr
>> return
>>
>> + if (cpu_data.get(str(cpu) + 'ip') != None):
>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>
>> if (options.verbose == True):
>> print("Event type: %s" % name)
>> @@ -243,12 +245,18 @@ def process_event(param_dict):
>>
>> # Record for previous sample packet
>> cpu_data[str(cpu) + 'addr'] = addr
>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>
>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>> if (start_addr == 0 and stop_addr == 4):
>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>> return
>>
>> + if (stop_addr < start_addr and prev_ip != 0):
>> + # Continuity of the Packets broken, set start_addr to
>> previous
>> + # packet ip to complete the remaining tracing of the
>> address range.
>> + start_addr = prev_ip
>> +
>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>> print("Start address 0x%x is out of range [ 0x%x ..
>> 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>> return
>>
>> Without this patch below is the failure log(with segfault) for reference.
>>
>> [root at sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1
>> dd if=/dev/zero of=/dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 1.087 MB perf.data ]
>> [root at sut01sys-r214 perf]# ./perf script
>> --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump
>> -k ../../vmlinux -v $* > dump
>> objdump: error: the stop address should be after the start address
>> Traceback (most recent call last):
>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>> process_event
>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>> print_disam
>> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>> ^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>> raise CalledProcessError(retcode, process.args,
>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>> '--start-address=0xffff80008125b758',
>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>> non-zero exit status 1.
>> Fatal Python error: handler_call_die: problem in Python trace event
>> handler
>> Python runtime state: initialized
>>
>> Current thread 0x0000ffffb90d54e0 (most recent call first):
>> <no Python frame>
>>
>> Extension modules: perf_trace_context, systemd._journal,
>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>> problem._py3abrt (total: 7)
>> Aborted (core dumped)
>>
>>
>> dump snippet:
>> ============
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>> ffff800080313f04: 36100094 tbz w20, #2,
>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>> ffff800080313f0c: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> __perf_event_header__init_id+0x54
>> .../coresight/linux/kernel/events/core.c 586 return
>> event->clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008030cb00 <local_clock>:
>> ffff80008030cb00: d503233f paciasp
>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>> [sp, #-16]!
>> ffff80008030cb08: 910003fd mov x29, sp
>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>> <sched_clock>
>> perf 12720/12720 [0001] 5986.372298040
>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>> return sched_clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff8000801bb4a8 <sched_clock>:
>> ffff8000801bb4a8: d503233f paciasp
>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>> [sp, #-32]!
>> ffff8000801bb4b0: 910003fd mov x29, sp
>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>> <sched_clock_noinstr>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>> sched_clock_noinstr();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8a8 <sched_clock_noinstr>:
>> ffff80008125a8a8: d503233f paciasp
>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>> [sp, #-64]!
>> ffff80008125a8b0: 910003fd mov x29, sp
>> ffff80008125a8b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff80008125a8b8: b000e354 adrp x20,
>> ffff800082ec3000 <tick_bc_dev+0x140>
>> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
>> ffff80008125a8c0: a90363f7 stp x23, x24,
>> [sp, #48]
>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>> ffff80008125a8c8: 52800518 mov w24, #0x28
>> // #40
>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>> [sp, #32]
>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>> ffff80008125a8e4: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>>
>>
>> With fix:
>> =========
>>
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
>> 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff800080313f04 <__perf_event_header__init_id+0x4c>:
>> ffff800080313f04: 36100094 tbz w20, #2,
>> ffff800080313f14 <__perf_event_header__init_id+0x5c>
>> ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
>> ffff800080313f0c: d63f0000 blr x0
>> perf 12720/12720 [0001] 5986.372298040
>> __perf_event_header__init_id+0x54
>> .../coresight/linux/kernel/events/core.c 586 return
>> event->clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008030cb00 <local_clock>:
>> ffff80008030cb00: d503233f paciasp
>> ffff80008030cb04: a9bf7bfd stp x29, x30,
>> [sp, #-16]!
>> ffff80008030cb08: 910003fd mov x29, sp
>> ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8
>> <sched_clock>
>> perf 12720/12720 [0001] 5986.372298040
>> local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64
>> return sched_clock();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
>> 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff8000801bb4a8 <sched_clock>:
>> ffff8000801bb4a8: d503233f paciasp
>> ffff8000801bb4ac: a9be7bfd stp x29, x30,
>> [sp, #-32]!
>> ffff8000801bb4b0: 910003fd mov x29, sp
>> ffff8000801bb4b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff8000801bb4b8: d5384113 mrs x19, sp_el0
>> ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
>> ffff8000801bb4c0: 11000400 add w0, w0, #0x1
>> ffff8000801bb4c4: b9001260 str w0, [x19, #16]
>> ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8
>> <sched_clock_noinstr>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns =
>> sched_clock_noinstr();
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8a8 <sched_clock_noinstr>:
>> ffff80008125a8a8: d503233f paciasp
>> ffff80008125a8ac: a9bc7bfd stp x29, x30,
>> [sp, #-64]!
>> ffff80008125a8b0: 910003fd mov x29, sp
>> ffff80008125a8b4: a90153f3 stp x19, x20,
>> [sp, #16]
>> ffff80008125a8b8: b000e354 adrp x20,
>> ffff800082ec3000 <tick_bc_dev+0x140>
>> ffff80008125a8bc: 910d0294 add x20, x20, #0x340
>> ffff80008125a8c0: a90363f7 stp x23, x24,
>> [sp, #48]
>> ffff80008125a8c4: 91002297 add x23, x20, #0x8
>> ffff80008125a8c8: 52800518 mov w24, #0x28
>> // #40
>> ffff80008125a8cc: a9025bf5 stp x21, x22,
>> [sp, #32]
>> ffff80008125a8d0: b9400296 ldr w22, [x20]
>> ffff80008125a8d4: 120002d5 and w21, w22, #0x1
>> ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
>> ffff80008125a8dc: 8b1502f3 add x19, x23, x21
>> ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
>> ffff80008125a8e4: d63f0000 blr x0
>
> It looks like the disassembly now assumes this BLR wasn't taken. We go
> from ffff80008125a8e4 straight through to ...
>
>> perf 12720/12720 [0001] 5986.372298040
>> sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c
>> 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
>> Event type: branches
>> Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
>> 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720
>> period: 1 time: 5986372298040 }
>> ffff80008125a8e8 <sched_clock_noinstr+0x40>:
>> ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
>
> ffff80008125a8e4 which is just the previous one +4. Isn't your issue
> actually a decode issue in Perf itself? Why is there a discontinuity
> without branch samples being generated where either the source or
> destination address is 0?
>
> What are your record options to create this issue? As I mentioned in the
> previous reply I haven't been able to reproduce it.
I am using below perf record command.
timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
>
Thanks,
Ganapat
More information about the linux-arm-kernel
mailing list