[RFC PATCH 00/10] Support free page filtering looking up mem_map array
Atsushi Kumagai
kumagai-atsushi at mxc.nes.nec.co.jp
Fri Jul 13 01:23:03 EDT 2012
Hello,
On Fri, 29 Jun 2012 15:23:37 +0900
Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> wrote:
> Hello HATAYAMA-san,
>
> On Fri, 29 Jun 2012 02:37:57 +0900
> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
>
> > Sorry for late posting. I made RFC patch set for free page filtering
> > looking up mem_map array. Unlike exiting method looking up free page
> > list, this is done in constant space.
> >
> > I intend this patch set to be merged with Kumagai-san's cyclic patch
> > set, so I mark these with RFC. See TODO below. Also, I have yet to
> > test the logic for old kernels from v2.6.15 to v2.6.17.
> >
> > This new free page filtering needs the following values.
> >
> > - OFFSET(page._mapcount)
> > - OFFSET(page.private)
> > - SIZE(pageflags)
> > - NUMBER(PG_buddy)
> > - NUMBER(PG_slab)
> > - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
> >
> > Unfortunately, OFFSET(_mapcount) and OFFSET(private) fields of page
> > structure cannot be obtained from VMLINUX using the exiting library in
> > makedumpfile since two members are anonymous components of union
> > types. We need a new interface for them.
> >
> > To try to use this patch set, it's handy to pass manually editted
> > VMCOREINFO file via -i option.
> >
> > TODO
> >
> > 1. Add new values in VMCOREINFO on the upstream kernel.
> >
> > 2. Decide when to use this logic instead of the existing free list
> > logic. Option is 1) introduce new dump level or 2) use it
> > automatically if --cyclic is specified. This patch chooses 1) only
> > for RFC use.
> >
> > 3. Consider how to deal with old kernels on which we cannot add the
> > values in VMCOREINFO. Options is 1) to force users to use VMLINUX,
> > 2) to cover them full hard coding or 3) to give up support on full
> > range of kernel versions ever.
>
> Thank you always for your work.
>
> I will review your patches and measure executing time with v2 patch
> of cyclic processing. If your patches are effective, then I will
> consider TODO above.
> Please wait for a while.
I did performance measuring with v2 patches of cyclic processing and HATAYAMA-san's patches.
And I fixed v2 patches to reduce wasteful process, please see the end of this mail.
How to measure:
- The source data is a vmcore saved on the disk, the size is 5,099,292,912 bytes.
- makedumpfile writes dumpfile to the same disk as the source data.
- I measured the execution time with time(1) and adopted the average of 5 times.
Test Cases:
- _mapcount:
This logic is implemented by HATAYAMA-san.
This logic looks up members of page structure instead of free_list
to filter out free pages.
- free_list
v2 patches choose this logic.
This logic looks up whole free_list to filter out free pages every cycle.
- upstream (v1.4.4):
This logic is NOT CYCLIC, uses temporary bitmap file as usual.
Example:
- _mapcount:
$ time makedumpfile --cyclic -d32 -i vmcoreinfo vmcore dumpfile.d32
- free_list:
$ time makedumpfile --cyclic -d16 -i vmcoreinfo vmcore dumpfile.d16
- upstream:
$ time makedumpfile -d16 -i vmcoreinfo vmcore dumpfile.d16
Result:
a) exclude only free pages
BUFSIZE_CYCLIC | | execution time [sec]
[byte] | num of cycle | _mapcount (-d32) | free_list (-d16) | upstream (-d16)
------------------+----------------+------------------+------------------+-----------------
1024 | 152 | 20.5204 | 28.8028 | -
1024 * 10 | 16 | 14.7460 | 18.7904 | -
1024 * 100 | 2 | 14.3962 | 17.9356 | -
1024 * 200 | 1 | 14.3166 | 17.8762 | 17.7928
b) exclude all unnecessary pages
BUFSIZE_CYCLIC | | execution time [sec]
[byte] | num of cycle | _mapcount (-d47) | free_list (-d31) | upstream (-d31)
------------------+----------------+------------------+------------------+-----------------
1024 | 152 | 11.5086 | 27.2906 | -
1024 * 10 | 16 | 6.0740 | 10.9998 | -
1024 * 100 | 2 | 5.7928 | 9.1534 | -
1024 * 200 | 1 | 5.6378 | 8.9924 | 5.0516
I expected that the difference of execution time increases based on number of cycle,
because I think that repeating scanning free_list is high cost.
And according to result, it seems right.
_mapcount logic can be expected good performance in almost case when --cyclic is specified.
So, I think that making effort to resolve TODO is worth for us.
However, I think more consideration is needed to decide whether to choose _mapcount logic
or not when --cyclic isn't specified.
> TODO
>
> 1. Add new values in VMCOREINFO on the upstream kernel.
I will send this request to the upstream kernel.
> 2. Decide when to use this logic instead of the existing free list
> logic. Option is 1) introduce new dump level or 2) use it
> automatically if --cyclic is specified. This patch chooses 1) only
> for RFC use.
I think 2) is reasonable from the result.
> 3. Consider how to deal with old kernels on which we cannot add the
> values in VMCOREINFO. Options is 1) to force users to use VMLINUX,
> 2) to cover them full hard coding or 3) to give up support on full
> range of kernel versions ever.
I don't want to choose 2), I think it's inefficient.
I want to require that users prepare VMCOREINFO file with -g option from vmlinux,
if cyclic processing is needed.
Do you have any comments ?
Thanks
Atsushi Kumagai
diff --git a/makedumpfile.c b/makedumpfile.c
index 0e4660f..981d72a 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3824,6 +3824,9 @@ __exclude_unnecessary_pages(unsigned long mem_map,
for (pfn = pfn_start; pfn < pfn_end; pfn++, mem_map += SIZE(page)) {
+ if (info->flag_cyclic && !is_cyclic_region(pfn))
+ continue;
+
/*
* Exclude the memory hole.
*/
@@ -3960,17 +3963,25 @@ exclude_unnecessary_pages_cyclic(void)
if (!exclude_free_page())
return FALSE;
- for (mm = 0; mm < info->num_mem_map; mm++) {
+ /*
+ * Exclude cache pages, cache private pages, user data pages, and free pages.
+ */
+ if (info->dump_level & DL_EXCLUDE_CACHE ||
+ info->dump_level & DL_EXCLUDE_CACHE_PRI ||
+ info->dump_level & DL_EXCLUDE_USER_DATA ||
+ info->dump_level & DL_EXCLUDE_FREE_CONST) {
+ for (mm = 0; mm < info->num_mem_map; mm++) {
- mmd = &info->mem_map_data[mm];
+ mmd = &info->mem_map_data[mm];
- if (mmd->mem_map == NOT_MEMMAP_ADDR)
- continue;
+ if (mmd->mem_map == NOT_MEMMAP_ADDR)
+ continue;
- if (mmd->pfn_end >= info->cyclic_start_pfn || mmd->pfn_start <= info->cyclic_end_pfn) {
- if (!__exclude_unnecessary_pages(mmd->mem_map,
- mmd->pfn_start, mmd->pfn_end))
- return FALSE;
+ if (mmd->pfn_end >= info->cyclic_start_pfn || mmd->pfn_start <= info->cyclic_end_pfn) {
+ if (!__exclude_unnecessary_pages(mmd->mem_map,
+ mmd->pfn_start, mmd->pfn_end))
+ return FALSE;
+ }
}
}
More information about the kexec
mailing list