[PATCH 2/2] makedumpfile: exclude unused vmemmap pages > (Cliff Wickman)
anderson at redhat.com
Thu Jan 2 14:00:34 EST 2014
----- Original Message -----
> On Thu, Jan 02, 2014 at 11:50:14AM -0500, Dave Anderson wrote:
> > ----- Original Message -----
> > > Date: Tue, 31 Dec 2013 17:36:02 -0600
> > > From: Cliff Wickman <cpw at sgi.com>
> > > To: kexec at lists.infradead.org, d.hatayama at jp.fujitsu.com,
> > > kumagai-atsushi at mxc.nes.nec.co.jp
> > > Subject: [PATCH 2/2] makedumpfile: exclude unused vmemmap pages
> > > Message-ID: <20131231233602.GB18522 at sgi.com>
> > > Content-Type: text/plain; charset=us-ascii
> > >
> > > On Tue, Dec 31, 2013 at 05:30:01PM -0600, cpw wrote:
> > >
> > > Exclude kernel pages that contain nothing but page structures for pages
> > > that are not being included in the dump.
> > > These can amount to 3.67 million pages per terabyte of system memory!
> > >
> > > The kernel's page table, starting at virtual address 0xffffea0000000000,
> > > is
> > > searched to find the actual pages containing the vmemmap page structures.
> > >
> > > Bitmap1 is a map of dumpable (i.e existing) pages. Bitmap2 is a map
> > > of pages not to be excluded.
> > > To speed the search of bitmaps only whole 64-bit words of 1's in
> > > bitmap1 and 0's in bitmap2 are tested to see if they are vmemmap pages.
> > >
> > > The list of vmemmap pfn's to be excluded is written to a small file in
> > > order
> > > to conserve crash kernel memory.
> > >
> > > In practice, this whole procedure only takes about 10 seconds on a
> > > 16TB machine.
> > >
> > > The effect of omitting unused page structures from the dump has only
> > > one, minimal side effect that I can find: the crash command "kmem -f"
> > > will
> > > fail when attempting to walk through free pages. This seems to me to be
> > > a trivial negative when weighed against the enabling and acceleration
> > > of dumps on large systems.
> > >
> > > This patch includes -e and -N options to exclude or include unneeded
> > > vmemmap pages regardless of system size (see flag_includevm and
> > > flag_excludvm). By default the exclusion of such pages is only
> > > done on a system of a terabyte or more.
> > Hi Cliff,
> > I understand the reason behind this, but the default exclusion
> > (even @ 1TB) makes me a little nervous.
> > Although I'm sure you tested this, I find it amazing that
> > only the "kmem -[fF]" option is the only command option
> > that is affected?
> Hi Dave,
> Maybe I missed some kmem options that walk free page lists.
> If a crash command is walking a page freelist it would use the
> list_head named 'lru' would it not? I only find lru references
> in crash's memory.c unwind.c gdb-7.6/sim/frv/cache.c gdb-7.6/bfd/cache.c
> I didn't do extensive tests of crash, but the kmem command was
> all I found.
Right, but look at all of the other page struct offsets in addition to
page.lru that are used. The page.flags usage comes to mind, and for
example, what would "kmem -p" display for the missing pages?
Or "kmem <address>"? And would "kmem -i" display invalid data?
I'm just speculating off the top of my head, but the page structure is
such a fundamental data structure with several of its fields being used,
just enter "help -o page_" to see all of its potential member usages.
> > If I'm not mistaken, this would be the first time that legitimate
> > kernel data would be excluded from the dump, and the user would
> > have no obvious way of knowing that it had been done, correct?
> > If it were encoded in the in the header somewhere, at least a
> > warning message could be printed during crash initialization.
> Agreed, it is legitimate kernel data. But it is data that represents
> memory that we are not capturing. So it would seem to me to be of
> little use. And on the other hand if we do capture that data the time
> to take the dump would be so long as to make the whole notion of doing
> a dump prohibitive.
> (Even with this patch it took 40 minutes to dump a system of 16TB.
> Without the patch that might be 5 hours. And soon there will be
> 64TB systems.)
> When kmem -f fails it does say that a needed page has been excluded
> from the dump.
> But an up-front message would be reasonable.
Perhaps the disk_dump_header.status field could be used? Currently only
the 3 DUMP_DH_COMPRESSED_xxx bits are used.
> > In any case, given that this can change traditional behavior,
> > I would prefer that the full set of pages be copied by default,
> > and only be excluded if the user configures it to do so.
> That could be easily done. It's not unreasonable to make the very large
> system require the special option. I just thought that the check of system
> size would be doing the system administrator a favor.
Yeah, I understand, but we don't do any other kind of restrictions without
purposefully specifying them with the -d arguments. IMHO it just seems to be
heading down a slippery slope that presumes makedumpfile "knows better"
than the administrator.
More information about the kexec