[makedumpfile PATCH] Allow PFN_EXCLUDED to be tunable via command line option --exclude-threshold

Eric DeVolder eric.devolder at oracle.com
Tue Jul 11 12:42:12 PDT 2017

Please see response below!

On 07/11/2017 02:43 AM, Atsushi Kumagai wrote:
> Hello Eric,
>>> On 07/07/2017 04:09 AM, Atsushi Kumagai wrote:
>>>>> The PFN_EXCLUDED value is used to control at which point a run of
>>>>> zeros in the bitmap (zeros denote excluded pages) is large enough
>>>>> to warrant truncating the current output segment and to create a
>>>>> new output segment (containing non-excluded pages), in an ELF dump.
>>>>> If the run is smaller than PFN_EXCLUDED, then those excluded pages
>>>>> are still output in the ELF dump, for the current output segment.
>>>>> By using smaller values of PFN_EXCLUDED, the resulting dump file
>>>>> size can be made smaller by actually removing more excluded pages
>>>>> from the resulting dump file.
>>>>> This patch adds the command line option --exclude-threshold=<value>
>>>>> to indicate the threshold. The default is 256, the legacy value
>>>>> of PFN_EXCLUDED. The smallest value permitted is 1.
>>>>> Using an existing vmcore, this was tested by the following:
>>>>> % makedumpfile -E -d31 --exclude-threshold=256 -x vmlinux vmcore
>>>>> newvmcore256
>>>>> % makedumpfile -E -d31 --exclude-threshold=4 -x vmlinux vmcore
>>>>> newvmcore4
>>>>> I utilize -d31 in order to exclude as many page types as possible,
>>>>> resulting in a [significantly] smaller file sizes than the original
>>>>> vmcore.
>>>>> -rwxrwx--- 1 edevolde edevolde 4034564096 Jun 27 10:24 vmcore
>>>>> -rw------- 1 edevolde edevolde 119808156 Jul  6 13:01 newvmcore256
>>>>> -rw------- 1 edevolde edevolde 100811276 Jul  6 13:08 newvmcore4
>>>>> The use of smaller value of PFN_EXCLUDED increases the number of
>>>>> output segments (the 'Number of program headers' in the readelf
>>>>> output) in the ELF dump file.
>>>> How will you tune the value ? I'm not sure what is the benefit of the
>>>> tunable PFN_EXCLUDED. If there is no regression caused by too many
>>>> PT_LOAD
>>>> entries, I think we can decide a concrete PFN_EXCLUDED.
>>> Allow me note two things prior to addressing the question.
>>> Note that the value for PFN_EXCLUDED really should be in the range:
>>>    1 <= PFN_EXCLUDED <= NUM_PAGES(largest segment)
>>> but that values larger than NUM_PAGES(largest segment) behave the same
>>> as NUM_PAGES(largest segment) and simply prevent makedumpfile from ever
>>> omitting excluded pages from the dump file.
>>> Also note that the ELF header allows for a 16-bit e_phnum value for the
>>> number of segments in the dump file. As it stands today, I doubt that
>>> anybody has come close to reaching 65535 segments, but the combination
>>> of larger and larger memories as well as the work we (Oracle) are doing
>>> to further enhance the capabilities of makedumpfile, I believe we will
>>> start to challenge this 65535 number.
> I overlooked the limitation of the number of segments, so I considered
> only "The first benefit" you said below.
>>> The ability to tune PFN_EXCLUDED allows one to minimize file size while
>>> still staying within ELF boundaries.
>>> There are two ways in which have PFN_EXCLUDED as a tunable parameter
>>> benefits the user.
>>> The first benefit is, when making PFN_EXCLUDED smaller, makedumpfile has
>>> more opportunities to NOT write excluded pages to the resulting dump
>>> file, thus obtaining a smaller overall dump file size. And since a
>>> PT_LOAD header is smaller than a page, this penalty (of more segments)
>>> will always result in a smaller file size. (In the example I cite the
>>> dump file was 18MB smaller with a PFN_EXCLUDED value of 4 than default
>>> 256, in spite of increasing the number of segments from 6 to 244).
>>> The second benefit is, when enabling PFN_EXCLUDED to become larger, it
>>> allows makedumpfile to continue to generate valid ELF dump files in the
>>> presence of larger and larger memory systems. Generally speaking, the
>>> goal is to minimize the size of dump files via the exclusion of
>>> uninteresting pages (ie zero, free, etc), especially as the size of
>>> memory continues to grow and grow. As the memory increases, there are
>>> more and more of these uninteresting pages, and more opportunities for
>>> makedumpfile to omit them (even at the current PFN_EXCLUDED value of
>>> 256). Furthermore, we are working on additional page exclusion
>>> strategies that will drive more and more opportunities for makedumpfile
>>> to omit these pages from the dump file. And as makedumpfile omits more
>>> and more pages from the dump file, that increases the number of segments
>>> needed.
>>> By enabling a user to tune the value of PFN_EXCLUDED, we provide an
>>> additional mechanism to balance the size of the ELF dump file with
>>> respect to the size of memory.
>> It occurred to me that offering the option "--exclude-threshold=auto"
>> whereby a binary search on the second bitmap in the function
>> get_loads_dumpfile_cyclic() to determine the optimum value of
>> PFN_EXCLUDED (optimum here meaning the smallest possible value while
>> still staying within 65535 segments, which would yield the smallest
>> possible dump file size for the given constraints) would be an excellent
>> feature to have?
> I think the "auto" is necessary for --exclude-threshold, the optimum
> value should be calculated automatically. Otherwise, it imposes trial-and-error
> on users every time, it doesn't sound practical. IOW, this patch is
> unacceptable if there is no mechanism to support users.
> So now, my only concern for this option is the processing time of the
> binary search.

OK, so the idea of "tuning" the value of PFN_EXCLUDED is agree-able,  
great! I will work on the binary search and report back with  
measurements on the processing time of 'crash'. From there we can  
determine if benefit is worthwhile.


> [snip]
>>>>> And with a larger number of segments, loading both vmcore and newvmcore4
>>>>> into 'crash' resulted in identical outputs when run with the dmesg, ps,
>>>>> files, mount, and net sub-commands.
>>>> What about the processing speed of crash, is there no slow down ?
>>> I did not observe a noticeable change in processing speed of crash.
> Good, it would be better to be represented by actual measured results.
> Thanks,
> Atsushi Kumagai

More information about the kexec mailing list