[PATCH RFC 00/11] makedumpfile: parallel processing
Chao Fan
cfan at redhat.com
Thu Dec 10 02:54:28 PST 2015
----- Original Message -----
> From: "Wenjian Zhou/周文剑" <zhouwj-fnst at cn.fujitsu.com>
> To: "Chao Fan" <cfan at redhat.com>
> Cc: "Atsushi Kumagai" <ats-kumagai at wm.jp.nec.com>, kexec at lists.infradead.org
> Sent: Thursday, December 10, 2015 6:32:32 PM
> Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
>
> On 12/10/2015 05:58 PM, Chao Fan wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Wenjian Zhou/周文剑" <zhouwj-fnst at cn.fujitsu.com>
> >> To: "Atsushi Kumagai" <ats-kumagai at wm.jp.nec.com>
> >> Cc: kexec at lists.infradead.org
> >> Sent: Thursday, December 10, 2015 5:36:47 PM
> >> Subject: Re: [PATCH RFC 00/11] makedumpfile: parallel processing
> >>
> >> On 12/10/2015 04:14 PM, Atsushi Kumagai wrote:
> >>>> Hello Kumagai,
> >>>>
> >>>> On 12/04/2015 10:30 AM, Atsushi Kumagai wrote:
> >>>>> Hello, Zhou
> >>>>>
> >>>>>> On 12/02/2015 03:24 PM, Dave Young wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On 12/02/15 at 01:29pm, "Zhou, Wenjian/周文剑" wrote:
> >>>>>>>> I think there is no problem if other test results are as expected.
> >>>>>>>>
> >>>>>>>> --num-threads mainly reduces the time of compressing.
> >>>>>>>> So for lzo, it can't do much help at most of time.
> >>>>>>>
> >>>>>>> Seems the help of --num-threads does not say it exactly:
> >>>>>>>
> >>>>>>> [--num-threads THREADNUM]:
> >>>>>>> Using multiple threads to read and compress data of each
> >>>>>>> page
> >>>>>>> in parallel.
> >>>>>>> And it will reduces time for saving DUMPFILE.
> >>>>>>> This feature only supports creating DUMPFILE in
> >>>>>>> kdump-comressed format from
> >>>>>>> VMCORE in kdump-compressed format or elf format.
> >>>>>>>
> >>>>>>> Lzo is also a compress method, it should be mentioned that
> >>>>>>> --num-threads only
> >>>>>>> supports zlib compressed vmcore.
> >>>>>>>
> >>>>>>
> >>>>>> Sorry, it seems that something I said is not so clear.
> >>>>>> lzo is also supported. Since lzo compresses data at a high speed, the
> >>>>>> improving of the performance is not so obvious at most of time.
> >>>>>>
> >>>>>>> Also worth to mention about the recommended -d value for this
> >>>>>>> feature.
> >>>>>>>
> >>>>>>
> >>>>>> Yes, I think it's worth. I forgot it.
> >>>>>
> >>>>> I saw your patch, but I think I should confirm what is the problem
> >>>>> first.
> >>>>>
> >>>>>> However, when "-d 31" is specified, it will be worse.
> >>>>>> Less than 50 buffers are used to cache the compressed page.
> >>>>>> And even the page has been filtered, it will also take a buffer.
> >>>>>> So if "-d 31" is specified, the filtered page will use a lot
> >>>>>> of buffers. Then the page which needs to be compressed can't
> >>>>>> be compressed parallel.
> >>>>>
> >>>>> Could you explain why compression will not be parallel in more detail ?
> >>>>> Actually the buffers are used also for filtered pages, it sounds
> >>>>> inefficient.
> >>>>> However, I don't understand why it prevents parallel compression.
> >>>>>
> >>>>
> >>>> Think about this, in a huge memory, most of the page will be filtered,
> >>>> and
> >>>> we have 5 buffers.
> >>>>
> >>>> page1 page2 page3 page4 page5 page6 page7
> >>>> .....
> >>>> [buffer1] [2] [3] [4] [5]
> >>>> unfiltered filtered filtered filtered filtered unfiltered
> >>>> filtered
> >>>>
> >>>> Since filtered page will take a buffer, when compressing page1,
> >>>> page6 can't be compressed at the same time.
> >>>> That why it will prevent parallel compression.
> >>>
> >>> Thanks for your explanation, I understand.
> >>> This is just an issue of the current implementation, there is no
> >>> reason to stand this restriction.
> >>>
> >>>>> Further, according to Chao's benchmark, there is a big performance
> >>>>> degradation even if the number of thread is 1. (58s vs 240s)
> >>>>> The current implementation seems to have some problems, we should
> >>>>> solve them.
> >>>>>
> >>>>
> >>>> If "-d 31" is specified, on the one hand we can't save time by
> >>>> compressing
> >>>> parallel, on the other hand we will introduce some extra work by adding
> >>>> "--num-threads". So it is obvious that it will have a performance
> >>>> degradation.
> >>>
> >>> Sure, there must be some overhead due to "some extra work"(e.g. exclusive
> >>> lock),
> >>> but "--num-threads=1 is 4 times slower than --num-threads=0" still sounds
> >>> too slow, the degradation is too big to be called "some extra work".
> >>>
> >>> Both --num-threads=0 and --num-threads=1 are serial processing,
> >>> the above "buffer fairness issue" will not be related to this
> >>> degradation.
> >>> What do you think what make this degradation ?
> >>>
> >>
> >> I can't get such result at this moment, so I can't do some further
> >> investigation
> >> right now. I guess it may be caused by the underlying implementation of
> >> pthread.
> >> I reviewed the test result of the patch v2 and found in different
> >> machines,
> >> the results are quite different.
> >
> > Hi Zhou Wenjian,
> >
> > I have done more tests in another machine with 128G memory, and get the
> > result:
> >
> > the size of vmcore is 300M in "-d 31"
> > makedumpfile -l --message-level 1 -d 31:
> > time: 8.6s page-faults: 2272
> >
> > makedumpfile -l --num-threads 1 --message-level 1 -d 31:
> > time: 28.1s page-faults: 2359
> >
> >
> > and the size of vmcore is 2.6G in "-d 0".
> > In this machine, I get the same result as yours:
> >
> >
> > makedumpfile -c --message-level 1 -d 0:
> > time: 597s page-faults: 2287
> >
> > makedumpfile -c --num-threads 1 --message-level 1 -d 0:
> > time: 602s page-faults: 2361
> >
> > makedumpfile -c --num-threads 2 --message-level 1 -d 0:
> > time: 337s page-faults: 2397
> >
> > makedumpfile -c --num-threads 4 --message-level 1 -d 0:
> > time: 175s page-faults: 2461
> >
> > makedumpfile -c --num-threads 8 --message-level 1 -d 0:
> > time: 103s page-faults: 2611
> >
> >
> > But the machine of my first test is not under my control, should I wait for
> > the first machine to do more tests?
> > If there are still some problems in my tests, please tell me.
> >
>
> Thanks a lot for your test, it seems that there is nothing wrong.
> And I haven't got any idea about more tests...
>
> Could you provide the information of your cpu ?
> I will do some further investigation later.
>
OK, of course, here is the information of cpu:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 8
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 16
Model: 8
Model name: Six-Core AMD Opteron(tm) Processor 8439 SE
Stepping: 0
CPU MHz: 2793.040
BogoMIPS: 5586.22
Virtualization: AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 5118K
NUMA node0 CPU(s): 0,8,16,24,32,40
NUMA node1 CPU(s): 1,9,17,25,33,41
NUMA node2 CPU(s): 2,10,18,26,34,42
NUMA node3 CPU(s): 3,11,19,27,35,43
NUMA node4 CPU(s): 4,12,20,28,36,44
NUMA node5 CPU(s): 5,13,21,29,37,45
NUMA node6 CPU(s): 6,14,22,30,38,46
NUMA node7 CPU(s): 7,15,23,31,39,47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> But I still believe it's better not to use "-l -d 31" and "--num-threads"
> at the same time, though it's very strange that the performance
> degradation is so big.
>
> --
> Thanks
> Zhou
>
> > Thanks,
> > Chao Fan
> >
> >
> >>
> >> It seems that I can get almost the same result of Chao from "PRIMEQUEST
> >> 1800E".
> >>
> >> ###################################
> >> - System: PRIMERGY RX300 S6
> >> - CPU: Intel(R) Xeon(R) CPU x5660
> >> - memory: 16GB
> >> ###################################
> >> ************ makedumpfile -d 7 ******************
> >> core-data 0 256
> >> threads-num
> >> -l
> >> 0 10 144
> >> 4 5 110
> >> 8 5 111
> >> 12 6 111
> >>
> >> ************ makedumpfile -d 31 ******************
> >> core-data 0 256
> >> threads-num
> >> -l
> >> 0 0 0
> >> 4 2 2
> >> 8 2 3
> >> 12 2 3
> >>
> >> ###################################
> >> - System: PRIMEQUEST 1800E
> >> - CPU: Intel(R) Xeon(R) CPU E7540
> >> - memory: 32GB
> >> ###################################
> >> ************ makedumpfile -d 7 ******************
> >> core-data 0 256
> >> threads-num
> >> -l
> >> 0 34 270
> >> 4 63 154
> >> 8 64 131
> >> 12 65 159
> >>
> >> ************ makedumpfile -d 31 ******************
> >> core-data 0 256
> >> threads-num
> >> -l
> >> 0 2 1
> >> 4 48 48
> >> 8 48 49
> >> 12 49 50
> >>
> >>>> I'm not so sure if it is a problem that the performance degradation is
> >>>> so
> >>>> big.
> >>>> But I think if in other cases, it works as expected, this won't be a
> >>>> problem(
> >>>> or a problem needs to be fixed), for the performance degradation
> >>>> existing
> >>>> in theory.
> >>>>
> >>>> Or the current implementation should be replaced by a new arithmetic.
> >>>> For example:
> >>>> We can add an array to record whether the page is filtered or not.
> >>>> And only the unfiltered page will take the buffer.
> >>>
> >>> We should discuss how to implement new mechanism, I'll mention this
> >>> later.
> >>>
> >>>> But I'm not sure if it is worth.
> >>>> For "-l -d 31" is fast enough, the new arithmetic also can't do much
> >>>> help.
> >>>
> >>> Basically the faster, the better. There is no obvious target time.
> >>> If there is room for improvement, we should do it.
> >>>
> >>
> >> Maybe we can improve the performance of "-c -d 31" in some case.
> >>
> >> BTW, we can easily get the theoretical performance by using the "--split".
> >>
> >> --
> >> Thanks
> >> Zhou
> >>
> >>
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> kexec at lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> >>
>
>
>
>
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
More information about the kexec
mailing list