[PATCH v2 00/10] makedumpfile: parallel processing
Atsushi Kumagai
ats-kumagai at wm.jp.nec.com
Wed Jul 8 01:10:52 PDT 2015
>Hello Atsushi Kumagai,
>
>I have tried a lot, and I think the big performance degradation only
>occurs in special CPU.
>I thought about two reasons, and I need your help to confirm which is
>the real one.
>
>The following tests will also be OK by using dumpfile instead of /proc/vmcore
>
>Test 1: distinguish whether it is resulted by multi-threads.
>apply patch: test1
>command1: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
>command2: ./makedumpfile -c /proc/vmcore vmcore --num-threads 8
>
>better to do some test in -l too.
>command1: ./makedumpfile -l /proc/vmcore vmcore
>command2: ./makedumpfile -l /proc/vmcore vmcore --num-threads 1
>command3: ./makedumpfile -l /proc/vmcore vmcore --num-threads 8
>
>Test 2: distinguish whether it is resulted by doing compress in thread
>2.1:
> apply patch: test2.1
> command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
>2.2:
> apply patch: test2.2
> command: ./makedumpfile -c /proc/vmcore vmcore --num-threads 1
>
>Thanks a lot.
>
>BTW, could you show me the cpu name, zlib version and glibc version ?
Here is the my environments, I tested on 1st kernel on a physical
machine from this time:
CPU: Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz (80 cores)
zlib: 1.2.3-29
glibc: 2.12-1.149
Here is the summary of the result, the whole logs are attached:
Test1:
| threads | compress time | exec time |
| 1 | 82.75 | 95.21 |
| 8 | 34.1~34.8 | 40.32 |
Test1(lzo):
| threads | compress time | exec time |
| - | - | 30.33 |
| 1 | - | 33.10 |
| 8 | - | 28.77 |
Test2-1:
| threads | compress time | exec time |
| 1 | 76.12 | 82.13 |
Test2-2:
| threads | compress time | exec time |
| 1 | 41.97 | 51.46 |
There are better results than on virtual machines, but still worse than
without creating thread.
Additionally, I collected performance logs by perf for debugging as below:
Test2-1:
38.93% makedumpfile-2- libz.so.1.2.3 [.] 0x0000000000006e30 // compress2(?)
5.16% makedumpfile-2- [kernel.kallsyms] [k] page_fault
5.02% makedumpfile-2- [kernel.kallsyms] [k] clear_page_c
4.96% makedumpfile-2- libc-2.12.so [.] __memset_sse2
4.64% makedumpfile-2- [kernel.kallsyms] [k] tick_program_event
3.77% makedumpfile-2- libz.so.1.2.3 [.] adler32
2.08% makedumpfile-2- libc-2.12.so [.] memcpy
1.95% makedumpfile-2- [kernel.kallsyms] [k] __alloc_pages_nodemask
1.54% makedumpfile-2- [kernel.kallsyms] [k] get_page_from_freelist
1.53% makedumpfile-2- [kernel.kallsyms] [k] tick_dev_program_event
1.32% makedumpfile-2- [kernel.kallsyms] [k] __mem_cgroup_commit_charge
1.08% makedumpfile-2- [kernel.kallsyms] [k] hrtimer_interrupt
0.94% makedumpfile-2- [kernel.kallsyms] [k] release_pages
0.85% makedumpfile-2- [kernel.kallsyms] [k] alloc_pages_vma
...
Test2-2:
66.46% makedumpfile-2- libz.so.1.2.3 [.] 0x000000000000564e // compress2(?)
6.71% makedumpfile-2- libz.so.1.2.3 [.] adler32
5.56% makedumpfile-2- libc-2.12.so [.] __memset_sse2
3.44% makedumpfile-2- libc-2.12.so [.] memcpy
2.85% makedumpfile-2- [kernel.kallsyms] [k] tick_dev_program_event
1.30% makedumpfile-2- libz.so.1.2.3 [.] _tr_flush_block
0.95% makedumpfile-2- [kernel.kallsyms] [k] tick_program_event
0.92% makedumpfile-2- libz.so.1.2.3 [.] _tr_init
0.69% makedumpfile-2- [kernel.kallsyms] [k] hrtimer_interrupt
0.63% makedumpfile-2- [kernel.kallsyms] [k] page_fault
0.34% makedumpfile-2- libpthread-2.12.so [.] __lll_unlock_wake
0.33% makedumpfile-2- libc-2.12.so [.] _int_malloc
0.33% makedumpfile-2- libc-2.12.so [.] _int_free
In the case of Test2-1(using pthread_create), longer cpu time was spent for some
kernel functions like page_fault() than Test2-2(calling kdump_thread_function_cyclic()
directly), it may be a one of the cause of the degradation. I haven't found the reason
yet, this is just for your information.
Thanks
Atsushi Kumagai
>--
>Thanks
>Zhou Wenjian
>
>On 06/30/2015 05:06 PM, Atsushi Kumagai wrote:
>>> On 06/26/2015 03:49 PM, Atsushi Kumagai wrote:
>>>> I attached 5 processors to the VM and I confirmed that all threads
>>>> consumed full cpu time by top(1) on the host:
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 17614 qemu 20 0 5792m 4.9g 5652 R 435.1 72.8 29:02.17 qemu-kvm
>>>>
>>>> So I think the performance must be improved...
>>>
>>> Since I can't get that result in all machines here, could you test it with the patch:time
>>> and show me the output?
>>> Using "./makedumpfile -c --num-threads 4 /proc/vmcore dumpfile" is OK.
>>>
>>> The attachment is the patch time.
>>
>> Here is the result:
>>
>> / # makedumpfile -c --num-threads 4 /proc/vmcore /mnt/dumpfile
>> Copying data : [100.0 %] |
>> Copying data : [100.0 %] \
>>
>> The dumpfile is saved to /mnt/dumpfile.
>>
>> makedumpfile Completed.
>> lock time: 310s935500us
>> write time: 3s970037us
>> hit time: 6s316043us
>> find time: 317s926654us
>> loop_time: 37s321800us
>> thread consume_time: 0s0us
>> thread timea: 0s0us
>> thread timeb: 0s0us
>> read_time[0]: 8s637011us
>> lock_current_time[0]: 0s284428us
>> found_time[0]: 60s366795us
>> lock_consumed_time[0]: 2s782596us
>> compress_time[0]: 301s427073us
>> read_time[1]: 8s435914us
>> lock_current_time[1]: 0s271680us
>> found_time[1]: 60s329026us
>> lock_consumed_time[1]: 2s849061us
>> compress_time[1]: 302s98620us
>> read_time[2]: 8s380550us
>> lock_current_time[2]: 0s270388us
>> found_time[2]: 60s209376us
>> lock_consumed_time[2]: 3s297574us
>> compress_time[2]: 301s486768us
>> read_time[3]: 8s550662us
>> lock_current_time[3]: 0s278997us
>> found_time[3]: 60s476702us
>> lock_consumed_time[3]: 3s49184us
>> compress_time[3]: 301s718390us
>> count1: 172
>> count2: 70921401
>> count3: 0
>> count4: 0
>> count5: 0
>> count6: 0
>> count7: 0
>> exec time: 380s125494us
>>
>>
>> BTW, I fixed a small mistake before testing like:
>>
>> - write_time = (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
>> + write_time += (write2.tv_sec - write1.tv_sec) * 1000000 + (write2.tv_usec - write1.tv_usec);
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1.log
Type: application/octet-stream
Size: 8195 bytes
Desc: test1.log
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1-lzo.log
Type: application/octet-stream
Size: 6548 bytes
Desc: test1-lzo.log
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2-1.log
Type: application/octet-stream
Size: 404 bytes
Desc: test2-1.log
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2-2.log
Type: application/octet-stream
Size: 404 bytes
Desc: test2-2.log
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20150708/53d4328c/attachment-0007.obj>
More information about the kexec
mailing list