[PATCH v2] makedumpfile: request the kernel do page scans

Hatayama, Daisuke d.hatayama at jp.fujitsu.com
Wed Nov 21 20:43:39 EST 2012


Hello Cliff,

I'm interested in this and I'll test this patch on our machine with 2TB
machines, though I need to start reservating environment, report would
be posted one or two weeks later.

On my requirement there is the case where dump filtering cannot be used.
Then dumpfile size becomes large and copying performance becomes crutial
not only for page scans but also writing processing, i.e. copies,
compresses and writs pages. I thought such work should probably be done
on kernel much more than now, but I didn't start this investigation now.

> -----Original Message-----
> From: Cliff Wickman [mailto:cpw at sgi.com]
> Sent: Thursday, November 22, 2012 5:07 AM
> To: Hatayama, Daisuke/畑山 大輔; kumagai-atsushi at mxc.nes.nec.co.jp
> Cc: kexec at lists.infradead.org
> Subject: [PATCH v2] makedumpfile: request the kernel do page scans
> 
> From: Cliff Wickman <cpw at sgi.com>
> 
> I've been experimenting with asking the kernel to scan the page tables
> instead of reading all those page structures through /proc/vmcore.
> The results are rather dramatic.
> On a small, idle UV: about 4 sec. versus about 40 sec.
> On a 8TB UV the unnecessary page scan takes 4 minutes, vs. about 200 min
> through /proc/vmcore.
> 

In the other mail, you explained the reason why this performance difference
occurs. But I don't understand the term ``units'' you used. Or it might be
better for me some kind of benchmark results, e.g with perf stat, showing
the difference in seconds even if the unit is really not essential.

> +/*
> + * limit the size of the pfn list to this many pfn_element structures
> + */
> +#define MAX_PFN_LIST 10000
> +
> +/*
> + * one element in the pfn_list
> + */
> +struct pfn_element {
> +	unsigned long pfn;
> +	unsigned long order;
> +};
> +
> +/*
> + * a request for finding pfn's that can be excluded from the dump
> + * they may be pages of particular types or free pages
> + */
> +struct pfn_list_request {
> +	int request;		/* PL_REQUEST_FREE PL_REQUEST_EXCLUDE or
> */
> +				/* PL_REQUEST_MEMMAP */
> +	int debug;
> +	unsigned long paddr;	/* mem_map address for
> PL_REQUEST_EXCLUDE */
> +	unsigned long pfn_start;/* pfn represented by paddr */
> +	unsigned long pgdat_paddr; /* for PL_REQUEST_FREE */
> +	unsigned long pgdat_vaddr; /* for PL_REQUEST_FREE */
> +	int node;		/* for PL_REQUEST_FREE */
> +	int exclude_bits;	/* for PL_REQUEST_EXCLUDE */
> +	int count;		/* for PL_REQUEST_EXCLUDE */
> +	void *reply_ptr;	/* address of user's pfn_reply, for reply
> */

How about passing bitmap instead of lists and making kernel side unset bits
corresponding to unnecessary pages? It's less coping and constant.

Thanks.
HATAYAMA, Daisuke




More information about the kexec mailing list