[PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken
Ganapatrao Kulkarni
gankulkarni at os.amperecomputing.com
Tue Jul 23 08:26:37 PDT 2024
On 23-07-2024 06:40 pm, James Clark wrote:
>
>
> On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:
>>
>> Hi James,
>>
>> On 19-07-2024 08:09 pm, James Clark wrote:
>>>
>>>
>>> On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
>>>> To generate the instruction tracing, script uses 2 contiguous packets
>>>> address range. If there a continuity brake due to discontiguous branch
>>>> address, it is required to reset the tracing and start tracing with the
>>>> new set of contiguous packets.
>>>>
>>>> Adding change to identify the break and complete the remaining tracing
>>>> of current packets and restart tracing from new set of packets, if
>>>> continuity is established.
>>>>
>>>
>>> Hi Ganapatrao,
>>>
>>> Can you add a before and after example of what's changed to the
>>> commit message? It wasn't immediately obvious to me if this is adding
>>> missing output, or it was correcting the tail end of the output that
>>> was previously wrong.
>>
>> It is adding tail end of the trace as well avoiding the segfault of
>> the perf application. With out this change the perf segfaults with as
>> below log
>>
>>
>> ./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py
>> -- -d objdump -k ../../vmlinux -v $* > dump
>> objdump: error: the stop address should be after the start address
>> Traceback (most recent call last):
>> File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
>> process_event
>> print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
>> File "./scripts/python/arm-cs-trace-disasm.py", line 105, in
>> print_disam
>> for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
>> disasm_output = check_output(disasm).decode('utf-8').split('\n')
>> ^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
>> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> File "/usr/lib64/python3.12/subprocess.py", line 571, in run
>> raise CalledProcessError(retcode, process.args,
>> subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
>> '--start-address=0xffff80008125b758',
>> '--stop-address=0xffff80008125a934', '../../vmlinux']' returned
>> non-zero exit status 1.
>> Fatal Python error: handler_call_die: problem in Python trace event
>> handler
>> Python runtime state: initialized
>>
>> Current thread 0x0000ffffb05054e0 (most recent call first):
>> <no Python frame>
>>
>> Extension modules: perf_trace_context, systemd._journal,
>> systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
>> problem._py3abrt (total: 7)
>> Aborted (core dumped)
>>
>>>
>>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni at os.amperecomputing.com>
>>>> ---
>>>> tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
>>>> 1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> index d973c2baed1c..ad10cee2c35e 100755
>>>> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
>>>> @@ -198,6 +198,10 @@ def process_event(param_dict):
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> return
>>>> + if (cpu_data.get(str(cpu) + 'ip') == None):
>>>> + cpu_data[str(cpu) + 'ip'] = ip
>>>> +
>>>
>>> Do you need to write into the global cpu_data here? Doesn't it get
>>> overwritten after you load it back into 'prev_ip'
>>
>> No, the logic is same as holding the addr of previous packet.
>> Saving the previous packet saved ip in to prev_ip before overwriting
>> with the current packet.
>
> It's not exactly the same logic as holding the addr of the previous
> sample. For addr, we return on the first None, with your change we now
> "pretend" that the second one is also the previous one:
>
> if (cpu_data.get(str(cpu) + 'addr') == None):
> cpu_data[str(cpu) + 'addr'] = addr
> return <----------------------------sample 0 return
>
> if (cpu_data.get(str(cpu) + 'ip') == None):
> cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return
>
> Then for sample 1 'prev_ip' is actually now the 'current' IP:
Yes, it is dummy for first packet. Added anticipating that we wont hit
the discontinuity for the first packet itself.
Can this be changed to more intuitive like below?
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py
b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..d49f5090059f 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -198,6 +198,8 @@ def process_event(param_dict):
cpu_data[str(cpu) + 'addr'] = addr
return
+ if (cpu_data.get(str(cpu) + 'ip') != None):
+ prev_ip = cpu_data[str(cpu) + 'ip']
if (options.verbose == True):
print("Event type: %s" % name)
@@ -243,12 +245,18 @@ def process_event(param_dict):
# Record for previous sample packet
cpu_data[str(cpu) + 'addr'] = addr
+ cpu_data[str(cpu) + 'ip'] = stop_addr
# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
if (start_addr == 0 and stop_addr == 4):
print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
return
+ if (stop_addr < start_addr and prev_ip != 0):
+ # Continuity of the Packets broken, set start_addr to
previous
+ # packet ip to complete the remaining tracing of the
address range.
+ start_addr = prev_ip
+
if (start_addr < int(dso_start) or start_addr > int(dso_end)):
print("Start address 0x%x is out of range [ 0x%x ..
0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
return
Without this patch below is the failure log(with segfault) for reference.
[root at sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1 dd
if=/dev/zero of=/dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.087 MB perf.data ]
[root at sut01sys-r214 perf]# ./perf script
--script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k
../../vmlinux -v $* > dump
objdump: error: the stop address should be after the start address
Traceback (most recent call last):
File "./scripts/python/arm-cs-trace-disasm.py", line 271, in
process_event
print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
disasm_output = check_output(disasm).decode('utf-8').split('\n')
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['objdump', '-d', '-z',
'--start-address=0xffff80008125b758',
'--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero
exit status 1.
Fatal Python error: handler_call_die: problem in Python trace event handler
Python runtime state: initialized
Current thread 0x0000ffffb90d54e0 (most recent call first):
<no Python frame>
Extension modules: perf_trace_context, systemd._journal,
systemd._reader, systemd.id128, report._py3report, _dbus_bindings,
problem._py3abrt (total: 7)
Aborted (core dumped)
dump snippet:
============
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2,
ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
__perf_event_header__init_id+0x54
.../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
#-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl
ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040
local_clock+0xc
...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
#-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl
ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x20
...sight/linux/kernel/time/sched_clock.c 105 ns =
sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
#-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20,
ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28
// #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x3c
...sight/linux/kernel/time/sched_clock.c 93 cyc =
(rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
With fix:
=========
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr:
0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2,
ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
__perf_event_header__init_id+0x54
.../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr:
0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp,
#-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl
ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040
local_clock+0xc
...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp,
#-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl
ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x20
...sight/linux/kernel/time/sched_clock.c 105 ns =
sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr:
0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp,
#-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20,
ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28
// #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x3c
...sight/linux/kernel/time/sched_clock.c 93 cyc =
(rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr:
0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff80008125a8e8 <sched_clock_noinstr+0x40>:
ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
ffff80008125a8ec: a9409666 ldp x6, x5, [x19, #8]
ffff80008125a8f0: 29441261 ldp w1, w4, [x19, #32]
ffff80008125a8f4: d50339bf dmb ishld
ffff80008125a8f8: b9400282 ldr w2, [x20]
ffff80008125a8fc: 6b16005f cmp w2, w22
ffff80008125a900: 54fffe81 b.ne
ffff80008125a8d0 <sched_clock_noinstr+0x28> // b.any
ffff80008125a904: cb060000 sub x0, x0, x6
ffff80008125a908: 2a0103e1 mov w1, w1
ffff80008125a90c: 8a050000 and x0, x0, x5
ffff80008125a910: a94153f3 ldp x19, x20, [sp, #16]
ffff80008125a914: 9b017c00 mul x0, x0, x1
ffff80008125a918: a9425bf5 ldp x21, x22, [sp, #32]
ffff80008125a91c: a94363f7 ldp x23, x24, [sp, #48]
ffff80008125a920: 9ac42400 lsr x0, x0, x4
ffff80008125a924: a8c47bfd ldp x29, x30, [sp], #64
ffff80008125a928: d50323bf autiasp
ffff80008125a92c: 8b030000 add x0, x0, x3
ffff80008125a930: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040
sched_clock_noinstr+0x88
...sight/linux/kernel/time/sched_clock.c 99 }
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4ec phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4e4 pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4cc <sched_clock+0x24>:
ffff8000801bb4cc: aa0003f4 mov x20, x0
ffff8000801bb4d0: f9400a61 ldr x1, [x19, #16]
ffff8000801bb4d4: d1000421 sub x1, x1, #0x1
ffff8000801bb4d8: b9001261 str w1, [x19, #16]
ffff8000801bb4dc: b4000061 cbz x1,
ffff8000801bb4e8 <sched_clock+0x40>
ffff8000801bb4e0: f9400a60 ldr x0, [x19, #16]
ffff8000801bb4e4: b5000040 cbnz x0,
ffff8000801bb4ec <sched_clock+0x44>
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x3c
...ux/./arch/arm64/include/asm/preempt.h 74 return !pc ||
!READ_ONCE(ti->preempt_count);
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb10 phys_addr:
0x0000000000000000 ip: 0xffff8000801bb4fc pid: 12720 tid: 12720 period:
1 time: 5986372298040 }
ffff8000801bb4ec <sched_clock+0x44>:
ffff8000801bb4ec: aa1403e0 mov x0, x20
ffff8000801bb4f0: a94153f3 ldp x19, x20, [sp, #16]
ffff8000801bb4f4: a8c27bfd ldp x29, x30, [sp], #32
ffff8000801bb4f8: d50323bf autiasp
ffff8000801bb4fc: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040
sched_clock+0x54
...sight/linux/kernel/time/sched_clock.c 108 }
Still we miss tracing of 0xffff80008125b758, however seg-fault is avoided.
>
> prev_ip = cpu_data[str(cpu) + 'ip']
>
> This means that prev_ip is sometimes the previous sample's IP only
> sometimes (samples following 1), otherwise it's the current IP. Does
> your fix actually require this bit? Because we already save the 'real'
> previous one:
>
> cpu_data[str(cpu) + 'ip'] = stop_addr
>
> Also normally we save ip + 4 (stop_addr), where as you save ip. It's not
> clear why there is no need to add the 4?
>
>
>>>
>>> prev_ip = cpu_data[str(cpu) + 'ip']
>>>
>>> ... then ...
>>>
>>> # Record for previous sample packet
>>> cpu_data[str(cpu) + 'addr'] = addr
>>> cpu_data[str(cpu) + 'ip'] = stop_addr
>>>
>>> Would a local variable not accomplish the same thing?
>>
>> No, We need global to hold the ip of previous packet.
>>>
>>>> + prev_ip = cpu_data[str(cpu) + 'ip']
>>>> if (options.verbose == True):
>>>> print("Event type: %s" % name)
>>>> @@ -243,12 +247,18 @@ def process_event(param_dict):
>>>> # Record for previous sample packet
>>>> cpu_data[str(cpu) + 'addr'] = addr
>>>> + cpu_data[str(cpu) + 'ip'] = stop_addr
>>>> # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
>>>> if (start_addr == 0 and stop_addr == 4):
>>>> print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
>>>> return
>>>> + if (stop_addr < start_addr):
>>>> + # Continuity of the Packets broken, set start_addr to previous
>>>> + # packet ip to complete the remaining tracing of the
>>>> address range.
>
> After looking a bit more I'm also not sure why stop_addr < start_addr
> signifies a discontinuity. What if the discontinuity ends up with
> stop_addr > start_addr? There's no reason it can't jump forwards as well
> as backwards.
>
> Can you share the 3 samples from the --verbose output to the script that
> cause the issue?
>
> I see discontinuities as having the branch source (ip) set to 0 which is
> what we do at the start:
>
> Sample = { cpu: 0000 addr: 0x0000ffffb807adac phys_addr:
> 0x0000000000000000 ip: 0x0000000000000000 pid: 28388 }
>
> Then the ending one has the branch target (addr) set to 0:
>
> Sample = { cpu: 0000 addr: 0x0000000000000000 phys_addr:
> 0x0000000000000000 ip: 0x0000ffffb7eee168 pid: 28388 }
>
>
> And it doesn't hit objdump because of the range check:
>
> Start address 0x0 is out of range ...
>
> So I don't see any missing disassembly or crashes for this.
>
>>>> + start_addr = prev_ip
>>>> +
>>>> if (start_addr < int(dso_start) or start_addr > int(dso_end)):
>>>> print("Start address 0x%x is out of range [ 0x%x .. 0x%x ]
>>>> for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
>>>> return
>>
>> Thanks,
>> Ganapat
Thanks,
Ganapat
More information about the linux-arm-kernel
mailing list