[RFC PATCH 00/10] Support free page filtering looking up mem_map array

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Fri Jul 13 01:23:03 EDT 2012


Hello,

On Fri, 29 Jun 2012 15:23:37 +0900
Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> wrote:

> Hello HATAYAMA-san,
> 
> On Fri, 29 Jun 2012 02:37:57 +0900
> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
> 
> > Sorry for late posting. I made RFC patch set for free page filtering
> > looking up mem_map array. Unlike exiting method looking up free page
> > list, this is done in constant space.
> > 
> > I intend this patch set to be merged with Kumagai-san's cyclic patch
> > set, so I mark these with RFC. See TODO below. Also, I have yet to
> > test the logic for old kernels from v2.6.15 to v2.6.17.
> > 
> > This new free page filtering needs the following values.
> > 
> >   - OFFSET(page._mapcount)
> >   - OFFSET(page.private)
> >   - SIZE(pageflags)
> >   - NUMBER(PG_buddy)
> >   - NUMBER(PG_slab)
> >   - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
> > 
> > Unfortunately, OFFSET(_mapcount) and OFFSET(private) fields of page
> > structure cannot be obtained from VMLINUX using the exiting library in
> > makedumpfile since two members are anonymous components of union
> > types. We need a new interface for them.
> > 
> > To try to use this patch set, it's handy to pass manually editted
> > VMCOREINFO file via -i option.
> > 
> > TODO
> > 
> >   1. Add new values in VMCOREINFO on the upstream kernel.
> > 
> >   2. Decide when to use this logic instead of the existing free list
> >   logic. Option is 1) introduce new dump level or 2) use it
> >   automatically if --cyclic is specified. This patch chooses 1) only
> >   for RFC use.
> > 
> >   3. Consider how to deal with old kernels on which we cannot add the
> >   values in VMCOREINFO. Options is 1) to force users to use VMLINUX,
> >   2) to cover them full hard coding or 3) to give up support on full
> >   range of kernel versions ever.
> 
> Thank you always for your work.
> 
> I will review your patches and measure executing time with v2 patch
> of cyclic processing. If your patches are effective, then I will 
> consider TODO above.
> Please wait for a while.

I did performance measuring with v2 patches of cyclic processing and HATAYAMA-san's patches.
And I fixed v2 patches to reduce wasteful process, please see the end of this mail.


How to measure:

  - The source data is a vmcore saved on the disk, the size is 5,099,292,912 bytes.
  - makedumpfile writes dumpfile to the same disk as the source data.
  - I measured the execution time with time(1) and adopted the average of 5 times.

Test Cases:

  - _mapcount:
    This logic is implemented by HATAYAMA-san.
    This logic looks up members of page structure instead of free_list 
    to filter out free pages.

  - free_list
    v2 patches choose this logic.
    This logic looks up whole free_list to filter out free pages every cycle.

  - upstream (v1.4.4):
    This logic is NOT CYCLIC, uses temporary bitmap file as usual.

Example:

  - _mapcount:
    $ time makedumpfile --cyclic -d32  -i vmcoreinfo vmcore dumpfile.d32

  - free_list:
    $ time makedumpfile --cyclic -d16  -i vmcoreinfo vmcore dumpfile.d16

  - upstream:
    $ time makedumpfile -d16  -i vmcoreinfo vmcore dumpfile.d16

Result:

  a) exclude only free pages

     BUFSIZE_CYCLIC  |                |                  execution time [sec]
         [byte]      |  num of cycle  | _mapcount (-d32) | free_list (-d16) | upstream (-d16)
   ------------------+----------------+------------------+------------------+-----------------
       1024          |      152       |      20.5204     |      28.8028     |        -
       1024 * 10     |       16       |      14.7460     |      18.7904     |        -
       1024 * 100    |        2       |      14.3962     |      17.9356     |        -
       1024 * 200    |        1       |      14.3166     |      17.8762     |     17.7928

  b) exclude all unnecessary pages

     BUFSIZE_CYCLIC  |                |                  execution time [sec]
         [byte]      |  num of cycle  | _mapcount (-d47) | free_list (-d31) | upstream (-d31)
   ------------------+----------------+------------------+------------------+-----------------
       1024          |      152       |      11.5086     |      27.2906     |        -
       1024 * 10     |       16       |       6.0740     |      10.9998     |        -
       1024 * 100    |        2       |       5.7928     |       9.1534     |        -
       1024 * 200    |        1       |       5.6378     |       8.9924     |      5.0516


I expected that the difference of execution time increases based on number of cycle,
because I think that repeating scanning free_list is high cost.
And according to result, it seems right.

_mapcount logic can be expected good performance in almost case when --cyclic is specified.
So, I think that making effort to resolve TODO is worth for us.

However, I think more consideration is needed to decide whether to choose _mapcount logic
or not when --cyclic isn't specified.


> TODO
> 
>   1. Add new values in VMCOREINFO on the upstream kernel.

I will send this request to the upstream kernel.

>   2. Decide when to use this logic instead of the existing free list
>   logic. Option is 1) introduce new dump level or 2) use it
>   automatically if --cyclic is specified. This patch chooses 1) only
>   for RFC use.

I think 2) is reasonable from the result.

>   3. Consider how to deal with old kernels on which we cannot add the
>   values in VMCOREINFO. Options is 1) to force users to use VMLINUX,
>   2) to cover them full hard coding or 3) to give up support on full
>   range of kernel versions ever.

I don't want to choose 2), I think it's inefficient.
I want to require that users prepare VMCOREINFO file with -g option from vmlinux,
if cyclic processing is needed.


Do you have any comments ?


Thanks
Atsushi Kumagai

diff --git a/makedumpfile.c b/makedumpfile.c
index 0e4660f..981d72a 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3824,6 +3824,9 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 
 	for (pfn = pfn_start; pfn < pfn_end; pfn++, mem_map += SIZE(page)) {
 
+		if (info->flag_cyclic && !is_cyclic_region(pfn))
+			continue;
+
 		/*
 		 * Exclude the memory hole.
 		 */
@@ -3960,17 +3963,25 @@ exclude_unnecessary_pages_cyclic(void)
 		if (!exclude_free_page())
 			return FALSE;
 
-	for (mm = 0; mm < info->num_mem_map; mm++) {
+	/*
+	 * Exclude cache pages, cache private pages, user data pages, and free pages.
+	 */
+	if (info->dump_level & DL_EXCLUDE_CACHE ||
+	    info->dump_level & DL_EXCLUDE_CACHE_PRI ||
+	    info->dump_level & DL_EXCLUDE_USER_DATA ||
+	    info->dump_level & DL_EXCLUDE_FREE_CONST) {
+		for (mm = 0; mm < info->num_mem_map; mm++) {
 
-		mmd = &info->mem_map_data[mm];
+			mmd = &info->mem_map_data[mm];
 
-		if (mmd->mem_map == NOT_MEMMAP_ADDR)
-			continue;
+			if (mmd->mem_map == NOT_MEMMAP_ADDR)
+				continue;
 
-		if (mmd->pfn_end >= info->cyclic_start_pfn || mmd->pfn_start <= info->cyclic_end_pfn) {
-			if (!__exclude_unnecessary_pages(mmd->mem_map,
-							 mmd->pfn_start, mmd->pfn_end))
-				return FALSE;
+			if (mmd->pfn_end >= info->cyclic_start_pfn || mmd->pfn_start <= info->cyclic_end_pfn) {
+				if (!__exclude_unnecessary_pages(mmd->mem_map,
+								 mmd->pfn_start, mmd->pfn_end))
+					return FALSE;
+			}
 		}
 	}



More information about the kexec mailing list