[PATCH v3 00/10] makedumpfile: parallel processing

Wed Aug 5 19:46:35 PDT 2015

>On 07/31/2015 05:35 PM, "Zhou, Wenjian/周文剑" wrote:
>> On 07/31/2015 04:27 PM, Atsushi Kumagai wrote:
>>>> On 07/23/2015 02:20 PM, Atsushi Kumagai wrote:
>>>>>> Hello Kumagai,
>>>
>>> I assume that we are facing the known issue of glibc:
>>>
>>>    https://sourceware.org/ml/libc-alpha/2015-03/msg00270.html
>>>
>>> According to the thread above, per-thread arena is easy to be grown and
>>> trimmed compared with main arena.
>>> Actually compress2() calls malloc() and free() for compression each time
>>> it is called, so every compression processing will cause page fault.
>>> Moreover, I confirmed that many madvise(MADV_DONTNEED) are invoked only
>>> when compress2() is called in thread.
>>>
>>> OTOH, in lzo case, a temp buffer for working is allocated on the caller
>>> side, so it can reduce the number of malloc()/free() pair.
>>> (but I'm not sure why snappy doesn't hit this issue. The buffer size
>>> for compression may be smaller than the trim threshold.)
>>>
>>> Anyway, basically it's hard for zlib to avoid this issue on the application
>>> side, it seems that we have to accept the performance degradation caused by it.
>>> Unfortunately, the main target of this multi thread feature is zlib as you
>>> measured, we should resolve this issue somehow.
>>>
>>> Nevertheless, even now we can get some benefit of parallel processing,
>>> so lets' start to discuss the implementation of the parallel processing
>>> feature to accept this patch. I have some comments:
>>>
>>>    - read_pfn_parallel() doesn't use the cache feature(cache.c), is it
>>>      intentional with you ?
>>>
>>
>> Yes, since the data are read once a page here, cache feature seems not
>> needed.

OK, I see.

>>
>>>    - Now --num-buffers is tunable but the man description and your benchmark
>>>      didn't mention what is the benefit of this parameter.
>>>
>>
>> The default value of num-buffers is 50. Originally the value has great influence
>> on the performance. But since we changed the logic in the 2nd version of the
>> patch set, more buffers have little improvement(1000 buffers may have 1% improvement).
>> I'm considering if the option should be removed. what do you think about it?

I think this option should be removed, most users wouldn't use it.

>> BTW, the code (mlockall) added in the 3rd version works well in several machines here.
>> Should I keep it ?
>> With the codes, madvise(MADV_DONTNEED) will be failed in compress2 and the performance
>> is as expected in these machines.

That kludge isn't reasonable, it just change memory allocation pattern.
If you can't explain why it works well in theory, you should get rid of it.

Thanks
Atsushi Kumagai

>
>_______________________________________________
>kexec mailing list
>kexec at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec