makedumpfile memory usage grows with system memory size
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Thu Apr 5 21:12:12 EDT 2012
From: Vivek Goyal <vgoyal at redhat.com>
Subject: Re: makedumpfile memory usage grows with system memory size
Date: Thu, 5 Apr 2012 10:34:39 -0400
> On Thu, Apr 05, 2012 at 03:52:11PM +0900, HATAYAMA Daisuke wrote:
>
> [..]
>> * Bad performance is free pages only. Cache, cache private, user and
>> zero pages are processed per range of memory in good performance.
>
> Hi Daisuke-san,
>
Hello Vivek,
> I am wondering why can't we walk through the memmap array and look into
> struct page for figuring out if page is free or not. Looks like that
> in the past we used to have PG_buddy flag and same information possibly
> could be retrieved by looking at page->_count field.
>
> So I am just curious that why do we walk through free pages list to figure
> out free pages instead of looking at "struct page".
Thanks. To be honest, I have just beginning with reading around here
and known PG_buddy just now. I have small checked this fact on 2.6.18
with the patch in the bottom of this mail and free pages found from
free_list and by PG_buddy check are coincide.
As Vivek says, more recent kernel has change around PG_buddy and the
patch says we should check _mapcount; I have yet to check this.
Author: Andrea Arcangeli <aarcange at redhat.com>
Date: Thu Jan 13 15:47:00 2011 -0800
thp: remove PG_buddy
PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
be added to page->flags without overflowing (because of the sparse section
bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
has to move the memory hotplug code from _mapcount to lru.next to avoid
any risk of clashes. We can't use lru.next for PG_buddy removal, but
memory hotplug can use lru.next even more easily than the mapcount
instead.
Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
$ git describe 5f24ce5fd34c3ca1b3d10d30da754732da64d5c0
v2.6.37-7012-g5f24ce5
So now we can walk on the memmap array also for free pages like other
kinds of memory. The question I have now is why the current
implementation was chosen. Is there any difference between two ways?
Subject: [PATCH] Add free pages message
---
makedumpfile.c | 9 +++++++++
makedumpfile.h | 1 +
print_info.h | 2 +-
3 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/makedumpfile.c b/makedumpfile.c
index c843567..bd770b1 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3198,6 +3198,9 @@ reset_bitmap_of_free_pages(unsigned long node_zones)
retcd = ANALYSIS_FAILED;
return FALSE;
}
+
+ FREEPAGE_MSG("order: %d migrate_type: %d pfn: %llu\n", order, migrate_type, start_pfn);
+
for (i = 0; i < (1<<order); i++) {
pfn = start_pfn + i;
clear_bit_on_2nd_bitmap_for_kernel(pfn);
@@ -3399,6 +3402,7 @@ _exclude_free_page(void)
}
if (!spanned_pages)
continue;
+ FREEPAGE_MSG("NR_ZONE: %d\n", i);
if (!reset_bitmap_of_free_pages(zone))
return FALSE;
}
@@ -3688,6 +3692,11 @@ __exclude_unnecessary_pages(unsigned long mem_map,
_count = UINT(pcache + OFFSET(page._count));
mapping = ULONG(pcache + OFFSET(page.mapping));
+ if ((info->dump_level & DL_EXCLUDE_FREE)
+ && (flags & (1UL << PG_flag))) {
+ FREEPAGE_MSG("PG_flag: flags: %#016lx pfn %llu\n", flags, pfn);
+ }
+
/*
* Exclude the cache page without the private page.
*/
diff --git a/makedumpfile.h b/makedumpfile.h
index ed1e9de..1faef47 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -67,6 +67,7 @@ int get_mem_type(void);
#define PG_lru_ORIGINAL (5)
#define PG_private_ORIGINAL (11) /* Has something at ->private */
#define PG_swapcache_ORIGINAL (15) /* Swap page: swp_entry_t in private */
+#define PG_buddy (19)
#define PAGE_MAPPING_ANON (1)
diff --git a/print_info.h b/print_info.h
index 94968ca..44415d3 100644
--- a/print_info.h
+++ b/print_info.h
@@ -42,7 +42,7 @@ void print_execution_time(char *step_name, struct timeval *tv_start);
* Message Level
*/
#define MIN_MSG_LEVEL (0)
-#define MAX_MSG_LEVEL (31)
+#define MAX_MSG_LEVEL (31+0x20)
#define DEFAULT_MSG_LEVEL (7) /* Print the progress indicator, the
common message, the error message */
#define ML_PRINT_PROGRESS (0x001) /* Print the progress indicator */
--
1.7.4.4
Thanks,
HATAYAMA, Daisuke
More information about the kexec
mailing list