makedumpfile memory usage grows with system memory size
d.hatayama at jp.fujitsu.com
Thu Apr 12 03:47:14 EDT 2012
From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
Subject: Re: makedumpfile memory usage grows with system memory size
Date: Thu, 12 Apr 2012 12:40:40 +0900
> On Tue, 10 Apr 2012 08:52:05 -0400
> Vivek Goyal <vgoyal at redhat.com> wrote:
>> On Tue, Apr 10, 2012 at 08:58:24AM +0900, HATAYAMA Daisuke wrote:
>> > From: Vivek Goyal <vgoyal at redhat.com>
>> > Subject: Re: makedumpfile memory usage grows with system memory size
>> > Date: Mon, 9 Apr 2012 14:57:28 -0400
>> > > On Fri, Apr 06, 2012 at 06:29:40PM +0900, HATAYAMA Daisuke wrote:
>> > >
>> > > [..]
>> > >> I agree. On the other hand, there is one more thing to consider. The
>> > >> value of order is in private member of the page descripter. Now
>> > >> there's no information for private member in VMCOREINFO. If we choose
>> > >> this method and delete the current one, it's necessary to prepare
>> > >> vmlinux file for old kernels.
>> > >
>> > > What information do you need to access "private" member of "struct page".
>> > > offset? Can't we extend VMCOREINFO to export this info too?
>> > >
>> > Yes, I mean offset of private member in page structure. The member
>> > contains order of the buddy. Extending VMCOREINFO is easy, but we
>> > cannot do that for old kernels, for which vmlinux is needed
>> > separately.
>> > This might be the same as what Kumagai-san says he doesn' want to
>> > change behaviour on kernel versions.
>> We can retain both the mechanisms. For newer kernels which export
>> page->private offset, we can walk through memmap array and prepare a
>> chunk of bitmap and discard it. For older kernels we can continue to walk
>> through free pages list and prepare big bitmap in userspace.
>> It is desirable to keep mechanism same across kernel versions, but
>> change is unavoidable as things evolve in newer kernels. So at max
>> we can provide backward compatibility with old kernels.
> I said I want to avoid changing behavior based on kernel versions,
> but it seems difficult as Vivek said. So, I will accept the changing
> if it is necessary.
> Now, I will make two prototypes to consider the method to figure out
> free pages.
> - a prototype based on _count
> - a prototype based on PG_buddy (or _mapcount)
> If prototypes work fine, then we can select the method.
I think the first one would work well and it's more accurate in
meaning of free page.
Although this might be not problematic in practice, new method that
walks on page tables can lead to different result from the previous
one that looks up free_list: looking at __free_pages(), it first
decreases page->_count and then add the page to free_list, and looking
at __alloc_pages(), it first retrieves a page from free_list and then
set page->_count to 1.
More information about the kexec