makedumpfile memory usage grows with system memory size

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Thu Apr 5 21:12:12 EDT 2012


From: Vivek Goyal <vgoyal at redhat.com>
Subject: Re: makedumpfile memory usage grows with system memory size
Date: Thu, 5 Apr 2012 10:34:39 -0400

> On Thu, Apr 05, 2012 at 03:52:11PM +0900, HATAYAMA Daisuke wrote:
> 
> [..]
>>   * Bad performance is free pages only. Cache, cache private, user and
>>     zero pages are processed per range of memory in good performance.
> 
> Hi Daisuke-san,
> 

Hello Vivek,

> I am wondering why can't we walk through the memmap array and look into
> struct page for figuring out if page is free or not. Looks like that
> in the past we used to have PG_buddy flag and same information possibly
> could be retrieved by looking at page->_count field. 
> 
> So I am just curious that why do we walk through free pages list to figure
> out free pages instead of looking at "struct page".

Thanks. To be honest, I have just beginning with reading around here
and known PG_buddy just now. I have small checked this fact on 2.6.18
with the patch in the bottom of this mail and free pages found from
free_list and by PG_buddy check are coincide.

As Vivek says, more recent kernel has change around PG_buddy and the
patch says we should check _mapcount; I have yet to check this.

Author: Andrea Arcangeli <aarcange at redhat.com>
Date:   Thu Jan 13 15:47:00 2011 -0800

     thp: remove PG_buddy

    PG_buddy can be converted to _mapcount == -2.  So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y.  This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes.  We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>

$ git describe 5f24ce5fd34c3ca1b3d10d30da754732da64d5c0
v2.6.37-7012-g5f24ce5

So now we can walk on the memmap array also for free pages like other
kinds of memory. The question I have now is why the current
implementation was chosen. Is there any difference between two ways?

Subject: [PATCH] Add free pages message

---
 makedumpfile.c |    9 +++++++++
 makedumpfile.h |    1 +
 print_info.h   |    2 +-
 3 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index c843567..bd770b1 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3198,6 +3198,9 @@ reset_bitmap_of_free_pages(unsigned long node_zones)
                                        retcd = ANALYSIS_FAILED;
                                        return FALSE;
                                }
+
+                               FREEPAGE_MSG("order: %d migrate_type: %d pfn: %llu\n", order, migrate_type, start_pfn);
+
                                for (i = 0; i < (1<<order); i++) {
                                        pfn = start_pfn + i;
                                        clear_bit_on_2nd_bitmap_for_kernel(pfn);
@@ -3399,6 +3402,7 @@ _exclude_free_page(void)
                        }
                        if (!spanned_pages)
                                continue;
+                       FREEPAGE_MSG("NR_ZONE: %d\n", i);
                        if (!reset_bitmap_of_free_pages(zone))
                                return FALSE;
                }
@@ -3688,6 +3692,11 @@ __exclude_unnecessary_pages(unsigned long mem_map,
                _count  = UINT(pcache + OFFSET(page._count));
                mapping = ULONG(pcache + OFFSET(page.mapping));

+               if ((info->dump_level & DL_EXCLUDE_FREE)
+                   && (flags & (1UL << PG_flag))) {
+                       FREEPAGE_MSG("PG_flag: flags: %#016lx pfn %llu\n", flags, pfn);
+               }
+
                /*
                 * Exclude the cache page without the private page.
                 */
diff --git a/makedumpfile.h b/makedumpfile.h
index ed1e9de..1faef47 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -67,6 +67,7 @@ int get_mem_type(void);
 #define PG_lru_ORIGINAL                (5)
 #define PG_private_ORIGINAL    (11)    /* Has something at ->private */
 #define PG_swapcache_ORIGINAL  (15)    /* Swap page: swp_entry_t in private */
+#define PG_buddy               (19)

 #define PAGE_MAPPING_ANON      (1)

diff --git a/print_info.h b/print_info.h
index 94968ca..44415d3 100644
--- a/print_info.h
+++ b/print_info.h
@@ -42,7 +42,7 @@ void print_execution_time(char *step_name, struct timeval *tv_start);
  * Message Level
  */
 #define MIN_MSG_LEVEL          (0)
-#define MAX_MSG_LEVEL          (31)
+#define MAX_MSG_LEVEL          (31+0x20)
 #define DEFAULT_MSG_LEVEL      (7)     /* Print the progress indicator, the
                                           common message, the error message */
 #define ML_PRINT_PROGRESS      (0x001) /* Print the progress indicator */
--
1.7.4.4

Thanks,
HATAYAMA, Daisuke




More information about the kexec mailing list