[PATCH 1/2] use raw i/o and root device to use less memory

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Sun Dec 14 18:33:58 PST 2014


>On Thu, Dec 11, 2014 at 06:34:32AM +0000, Atsushi Kumagai wrote:
>> Hello Cliff,
>>
>> >From: Cliff Wickman <cpw at sgi.com>
>> >
>> >This patch adds a -j to makedumpfile. With this option it uses direct i/o on the dump
>> >file and the bitmap file, thus enabling makedumpfile to run mode in a fairly small
>> >crashkernel area without using cyclic mode. It can dump system with many terabytes of
>> >memory using crashkernel=450M.
>>
>> First, let's separate the problems that you have.
>> (Actually you did it in previous patches.)
>>
>>   1. The cyclic mode is slow.
>>     -> You try to avoid this by using a disk for the bitmap.
>>
>>   2. Page cache uses up the memory for crash kernel.
>>     -> You try to avoid this by using direct i/o.
>>
>> >Without direct i/o the crash kernel will use kernel page cache for the writes.  This
>> >will use up a great deal of the crash kernel's alloted memory.
>>
>> This is the second problem.
>> Actually we faced a OOM caused by page cache (probably):
>>
>>   http://lists.infradead.org/pipermail/kexec/2014-April/011639.html
>>
>> so direct i/o may be helpful for such small crashkernel environments.
>>
>> >The -j option will also implicitly avoid cyclic mode.  Cyclic mode is slower, and
>> >is not needed if we use direct i/o.
>>
>> This is the first problem.
>> Direct i/o doesn't enable the non-cyclic mode, using a disk does it.
>> Anyway, I still think it's enough to change TMPDIR to a disk if you want
>> to choose --non-cyclic. I haven't gotten the reason why you change the
>> code yet.
>
>As you noticed in the patch, -j not only causes direct i/o. It
>also causes the bitmaps to use disk instead of tmpfs.
>
>I made the -j option turn off cyclic mode because I'm trading slower
>disk i/o for a smaller crashkernel.  And if the crashkernel stays
>small (even as it builds huge bitmaps) we don't need to use cyclic mode,
>So we save the extra pass through the page structures.

The -j option looks a set of configuration designed to fit YOUR case.
Direct i/o and non-cyclic mode and using a disk are individual setting,
they should be implemented as separated options, then we can discuss
the usefulness of each feature.

Moreover, the cyclic mode can be turned off with the --non-cyclic option,
please don't add an additional option.

>Cyclic mode works nicely. And the extra pass is hardly noticeable -- until
>you get into terabytes of memory.

Further, I want to get rid of the non-cyclic mode as I said before:

  http://lists.infradead.org/pipermail/kexec/2013-September/009489.html

Now the disadvantage of the cyclic mode is that it has to take a
two-pass approach for filtering, and you want to avoid it, right ?
The penalty time of two-pass filtering was about 20sec/1TB in the
case below:
 
  http://lists.infradead.org/pipermail/kexec/2013-March/008300.html

Sure, it would be better if we can remove this extra filtering time.

If the number of cycle is one, there is no need to take the two-pass
filtering because the whole of bitmap can be stored on the memory.
The current code always does the two-pass filtering regardless of the
number of cycle, but it can be fixed.

So the remain issue is the space to store the bitmap. I think we have
to use a disk as you did, but it shouldn't depend on the non-cyclic
mode, it should be designed for the cyclic mode. 
So I propose a new option to use a backing-file for the bitmap.
If we specify a file path, the file is used as bitmap file by mmap().
This idea can be implemented easily, but it's just a rough idea.

Anyway, I think "using a disk for one-pass filtering" may be useful but
isn't a general feature because there will be no disk if network dump
(-F option) is used, so we should make a specific option to declare to
use a disk.

To sum it up, my idea is:

   1. Get rid of the non-cyclic mode.
   2. Fix the redundant page scanning for 1 cycle case.
   3. Introduce the new option to specify the backing-file for bitmap.

I think this also can solve your problem, can't it ?

By the way, the non-cyclic is faster than the cyclic mode in your
environments ? Certainly the non-cyclic mode can avoid two-pass
filtering but it needs disk i/o to read and write the bitmap file
instead. I'm concerned that the disadvantage of the non-cyclic mode
defeats the advantage of it.
I think it's better to confirm it before we start actual work,
otherwise it will be wasted effort.

>> >Direct i/o is of course a bit slower, but not significantly slower when used in this
>> >almost-entirely sequential fashion.
>>
>> If you have a performance comparison between direct i/o and normal
>> file i/o, I'm curious to see it.
>
>Dumping a 2TB system
>- using the -j patch so that bitmaps and dump are using disk
>- everything being equal except the opening of files with or without O_DIRECT
>
>* using the -e patch so that we're not dumping page structures for
>  non-dumped pages:          dump size 570M (compressed)
>Page cached I/O
>  200 seconds   (writing dump file: 100 sec)
>Direct I/O
>  223 seconds   (writing dump file: 103 sec)
>
>* not using the -e patch:    dump size 3625M (compressed)
>Page cached I/O
>  620 seconds   (writing dump file: 525 sec)
>Direct I/O
>  700 seconds   (writing dump file: 590 sec)

Thanks, the benefit of -e option is pretty obvious, and now
this feature is optional and detectable by the flag in the header,
the basic idea sounds good to me. However, I hope also this feature
will be designed for cyclic mode on the same reason as I said below.


Thanks
Atsushi Kumagai

>-Cliff
>
>> >---
>> > makedumpfile.c |  417 ++++++++++++++++++++++++++++++++++++++++++++++-----------
>> > makedumpfile.h |    6
>> > print_info.c   |    5
>> > 3 files changed, 347 insertions(+), 81 deletions(-)
>> >
>> >Index: makedumpfile-1.5.7/makedumpfile.h
>> >===================================================================
>> >--- makedumpfile-1.5.7.orig/makedumpfile.h
>> >+++ makedumpfile-1.5.7/makedumpfile.h
>> >@@ -18,6 +18,7 @@
>> >
>> > #include <stdio.h>
>> > #include <stdlib.h>
>> >+#define __USE_GNU
>> > #include <fcntl.h>
>> > #include <gelf.h>
>> > #include <sys/stat.h>
>> >@@ -222,6 +223,7 @@ isAnon(unsigned long mapping)
>> > #define FILENAME_BITMAP		"kdump_bitmapXXXXXX"
>> > #define FILENAME_STDOUT		"STDOUT"
>> > #define MAP_REGION		(4096*1024)
>> >+#define DIRECT_ALIGN		(512)
>> >
>> > /*
>> >  * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
>> >@@ -892,7 +894,8 @@ struct dump_bitmap {
>> > 	int		fd;
>> > 	int		no_block;
>> > 	char		*file_name;
>> >-	char		buf[BUFSIZE_BITMAP];
>> >+	char		*buf;
>> >+	char		*buf_malloced;
>> > 	off_t		offset;
>> > };
>> >
>> >@@ -900,6 +903,7 @@ struct cache_data {
>> > 	int	fd;
>> > 	char	*file_name;
>> > 	char	*buf;
>> >+	char    *buf_malloced;
>> > 	size_t	buf_size;
>> > 	size_t	cache_size;
>> > 	off_t	offset;
>> >Index: makedumpfile-1.5.7/print_info.c
>> >===================================================================
>> >--- makedumpfile-1.5.7.orig/print_info.c
>> >+++ makedumpfile-1.5.7/print_info.c
>> >@@ -58,7 +58,7 @@ print_usage(void)
>> > 	MSG("\n");
>> > 	MSG("Usage:\n");
>> > 	MSG("  Creating DUMPFILE:\n");
>> >-	MSG("  # makedumpfile    [-c|-l|-p|-E] [-d DL] [-x VMLINUX|-i VMCOREINFO] VMCORE\n");
>> >+	MSG("  # makedumpfile    [-c|-l|-p|-E] [-d DL] [-j] [-x VMLINUX|-i VMCOREINFO] VMCORE\n");
>> > 	MSG("    DUMPFILE\n");
>> > 	MSG("\n");
>> > 	MSG("  Creating DUMPFILE with filtered kernel data specified through filter config\n");
>> >@@ -108,6 +108,9 @@ print_usage(void)
>> > 	MSG("      -E option, because the ELF format does not support compressed data.\n");
>> > 	MSG("      THIS IS ONLY FOR THE CRASH UTILITY.\n");
>> > 	MSG("\n");
>> >+	MSG("  [-j]:\n");
>> >+	MSG("      Use raw (O_DIRECT) i/o on dump and bitmap files to avoid expanding kernel pagecache.\n");
>> >+	MSG("\n");
>> > 	MSG("  [-d DL]:\n");
>> > 	MSG("      Specify the type of unnecessary page for analysis.\n");
>> > 	MSG("      Pages of the specified type are not copied to DUMPFILE. The page type\n");
>> >Index: makedumpfile-1.5.7/makedumpfile.c
>> >===================================================================
>> >--- makedumpfile-1.5.7.orig/makedumpfile.c
>> >+++ makedumpfile-1.5.7/makedumpfile.c
>> >@@ -79,8 +79,11 @@ mdf_pfn_t pfn_free;
>> > mdf_pfn_t pfn_hwpoison;
>> >
>> > mdf_pfn_t num_dumped;
>> >+long blocksize;
>> >
>> > int retcd = FAILED;	/* return code */
>> >+// jflag is rawio on the dumpfile and bitmap file
>> >+int jflag = 0;
>> >
>> > #define INITIALIZE_LONG_TABLE(table, value) \
>> > do { \
>> >@@ -966,10 +969,17 @@ int
>> > open_dump_file(void)
>> > {
>> > 	int fd;
>> >-	int open_flags = O_RDWR|O_CREAT|O_TRUNC;
>> >+	int open_flags;
>> >
>> >+	if (jflag)
>> >+		open_flags = O_RDWR|O_CREAT|O_TRUNC|O_DIRECT;
>> >+	else
>> >+		open_flags = O_RDWR|O_CREAT|O_TRUNC;
>> >+
>> >+#if 0
>> > 	if (!info->flag_force)
>> > 		open_flags |= O_EXCL;
>> >+#endif
>> >
>> > 	if (info->flag_flatten) {
>> > 		fd = STDOUT_FILENO;
>> >@@ -1005,12 +1015,40 @@ check_dump_file(const char *path)
>> > int
>> > open_dump_bitmap(void)
>> > {
>> >-	int i, fd;
>> >-	char *tmpname;
>> >-
>> >-	tmpname = getenv("TMPDIR");
>> >-	if (!tmpname)
>> >-		tmpname = "/tmp";
>> >+	int i, fd, flags;
>> >+	char *tmpname, *cp;
>> >+	char prefix[100];
>> >+	int len;
>> >+
>> >+	/* -j: saving memory by doing direct i/o, so also avoid /tmp for the bit map files
>> >+	 *     because /tmp is using tmpfs */
>> >+	if (!jflag) {
>> >+		tmpname = getenv("TMPDIR");
>> >+		if (!tmpname)
>> >+			tmpname = "/tmp";
>> >+	} else {
>> >+		/* for the crash kernel environment use the prefix of
>> >+ 		   the dump name   e.g. /mnt//var/.... */
>> >+		if (!strchr(info->name_dumpfile,'v')) {
>> >+			printf("no /var found in name_dumpfile %s\n",
>> >+			info->name_dumpfile);
>> >+			exit(1);
>> >+		} else {
>> >+			cp = strchr(info->name_dumpfile,'v');
>> >+			if (strncmp(cp-1, "/var", 4)) {
>> >+				printf("no /var found in name_dumpfile %s\n",
>> >+					info->name_dumpfile);
>> >+				exit(1);
>> >+			}
>> >+		}
>> >+		len = cp - info->name_dumpfile - 1;
>> >+		strncpy(prefix, info->name_dumpfile, len);
>> >+		if (*(prefix + len - 1) == '/')
>> >+			len -= 1;
>> >+		*(prefix + len) = '\0';
>> >+		tmpname = prefix;
>> >+		strcat(tmpname, "/");
>> >+ 	}
>> >
>> > 	if ((info->name_bitmap = (char *)malloc(sizeof(FILENAME_BITMAP) +
>> > 						strlen(tmpname) + 1)) == NULL) {
>> >@@ -1019,9 +1057,12 @@ open_dump_bitmap(void)
>> > 		return FALSE;
>> > 	}
>> > 	strcpy(info->name_bitmap, tmpname);
>> >-	strcat(info->name_bitmap, "/");
>> > 	strcat(info->name_bitmap, FILENAME_BITMAP);
>> >-	if ((fd = mkstemp(info->name_bitmap)) < 0) {
>> >+	if (jflag)
>> >+		flags = O_RDWR|O_CREAT|O_TRUNC|O_DIRECT;
>> >+	else
>> >+		flags = O_RDWR|O_CREAT|O_TRUNC;
>> >+	if ((fd = open(info->name_bitmap, flags)) < 0) {
>> > 		ERRMSG("Can't open the bitmap file(%s). %s\n",
>> > 		    info->name_bitmap, strerror(errno));
>> > 		return FALSE;
>> >@@ -2985,6 +3026,7 @@ initialize_bitmap_memory(void)
>> > 	struct dump_bitmap *bmp;
>> > 	off_t bitmap_offset;
>> > 	off_t bitmap_len, max_sect_len;
>> >+	char *cp;
>> > 	mdf_pfn_t pfn;
>> > 	int i, j;
>> > 	long block_size;
>> >@@ -3006,7 +3048,14 @@ initialize_bitmap_memory(void)
>> > 	bmp->fd        = info->fd_memory;
>> > 	bmp->file_name = info->name_memory;
>> > 	bmp->no_block  = -1;
>> >-	memset(bmp->buf, 0, BUFSIZE_BITMAP);
>> >+	if ((cp = malloc(blocksize + DIRECT_ALIGN)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for the bitmap buffer. %s\n",
>> >+		    strerror(errno));
>> >+		exit(1);
>> >+	}
>> >+	bmp->buf_malloced = cp;
>> >+	bmp->buf = cp - ((unsigned long)cp % DIRECT_ALIGN) + DIRECT_ALIGN;
>> >+	memset(bmp->buf, 0, blocksize);
>> > 	bmp->offset = bitmap_offset + bitmap_len / 2;
>> > 	info->bitmap_memory = bmp;
>> >
>> >@@ -3018,6 +3067,7 @@ initialize_bitmap_memory(void)
>> > 	if (info->valid_pages == NULL) {
>> > 		ERRMSG("Can't allocate memory for the valid_pages. %s\n",
>> > 		    strerror(errno));
>> >+		free(bmp->buf_malloced);
>> > 		free(bmp);
>> > 		return FALSE;
>> > 	}
>> >@@ -3318,9 +3368,18 @@ out:
>> > void
>> > initialize_bitmap(struct dump_bitmap *bitmap)
>> > {
>> >+	char *cp;
>> >+
>> > 	bitmap->fd        = info->fd_bitmap;
>> > 	bitmap->file_name = info->name_bitmap;
>> > 	bitmap->no_block  = -1;
>> >+	if ((cp = malloc(blocksize + DIRECT_ALIGN)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for the bitmap buffer. %s\n",
>> >+		    strerror(errno));
>> >+		exit(1);
>> >+	}
>> >+	bitmap->buf_malloced = cp;
>> >+	bitmap->buf = cp - ((unsigned long)cp % DIRECT_ALIGN) + DIRECT_ALIGN;
>> > 	memset(bitmap->buf, 0, BUFSIZE_BITMAP);
>> > }
>> >
>> >@@ -3385,9 +3444,9 @@ set_bitmap(struct dump_bitmap *bitmap, m
>> > 	byte = (pfn%PFN_BUFBITMAP)>>3;
>> > 	bit  = (pfn%PFN_BUFBITMAP) & 7;
>> > 	if (val)
>> >-		bitmap->buf[byte] |= 1<<bit;
>> >+		*(bitmap->buf + byte) |= 1<<bit;
>> > 	else
>> >-		bitmap->buf[byte] &= ~(1<<bit);
>> >+		*(bitmap->buf + byte) &= ~(1<<bit);
>> >
>> > 	return TRUE;
>> > }
>> >@@ -3570,6 +3629,29 @@ read_cache(struct cache_data *cd)
>> > 	return TRUE;
>> > }
>> >
>> >+void
>> >+fill_to_offset(struct cache_data *cd, int blocksize)
>> >+{
>> >+	off_t current;
>> >+	long num_blocks;
>> >+	long i;
>> >+
>> >+	current = lseek(cd->fd, 0, SEEK_CUR);
>> >+	if ((cd->offset - current) % blocksize) {
>> >+		printf("ERROR: fill area is %#lx\n", cd->offset - current);
>> >+		exit(1);
>> >+	}
>> >+	if (cd->cache_size < blocksize) {
>> >+		printf("ERROR: cache buf is only %ld\n", cd->cache_size);
>> >+		exit(1);
>> >+	}
>> >+	num_blocks = (cd->offset - current) / blocksize;
>> >+	for (i = 0; i < num_blocks; i++) {
>> >+		write(cd->fd, cd->buf, blocksize);
>> >+	}
>> >+	return;
>> >+}
>> >+
>> > int
>> > is_bigendian(void)
>> > {
>> >@@ -3639,6 +3721,14 @@ write_buffer(int fd, off_t offset, void
>> > int
>> > write_cache(struct cache_data *cd, void *buf, size_t size)
>> > {
>> >+	/* sanity check; do not overflow this buffer */
>> >+	/* (it is of cd->cache_size + info->page_size) */
>> >+	if (size > ((cd->cache_size - cd->buf_size) + info->page_size)) {
>> >+		fprintf(stderr, "write_cache buffer overflow! size %#lx\n",
>> >+			size);
>> >+		exit(1);
>> >+	}
>> >+
>> > 	memcpy(cd->buf + cd->buf_size, buf, size);
>> > 	cd->buf_size += size;
>> >
>> >@@ -3651,6 +3741,8 @@ write_cache(struct cache_data *cd, void
>> >
>> > 	cd->buf_size -= cd->cache_size;
>> > 	memcpy(cd->buf, cd->buf + cd->cache_size, cd->buf_size);
>> >+	if (cd->buf_size)
>> >+		memcpy(cd->buf, cd->buf + cd->cache_size, cd->buf_size);
>> > 	cd->offset += cd->cache_size;
>> > 	return TRUE;
>> > }
>> >@@ -3682,6 +3774,21 @@ write_cache_zero(struct cache_data *cd,
>> > 	return write_cache_bufsz(cd);
>> > }
>> >
>> >+/* flush the full cache to the file */
>> >+int
>> >+write_cache_flush(struct cache_data *cd)
>> >+{
>> >+	if (cd->buf_size == 0)
>> >+		return TRUE;
>> >+	if (cd->buf_size < cd->cache_size) {
>> >+		memset(cd->buf + cd->buf_size, 0, cd->cache_size - cd->buf_size);
>> >+	}
>> >+	cd->buf_size = cd->cache_size;
>> >+	if (!write_cache_bufsz(cd))
>> >+		return FALSE;
>> >+	return TRUE;
>> >+}
>> >+
>> > int
>> > read_buf_from_stdin(void *buf, int buf_size)
>> > {
>> >@@ -4414,11 +4521,19 @@ create_1st_bitmap(void)
>> > {
>> > 	int i;
>> > 	unsigned int num_pt_loads = get_num_pt_loads();
>> >- 	char buf[info->page_size];
>> >+ 	char *buf;
>> > 	mdf_pfn_t pfn, pfn_start, pfn_end, pfn_bitmap1;
>> > 	unsigned long long phys_start, phys_end;
>> > 	struct timeval tv_start;
>> > 	off_t offset_page;
>> >+	char *cp;
>> >+
>> >+	if ((cp = malloc(blocksize + DIRECT_ALIGN)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for the bitmap buffer. %s\n",
>> >+		    strerror(errno));
>> >+		exit(1);
>> >+	}
>> >+	buf = cp - ((unsigned long)cp % DIRECT_ALIGN) + DIRECT_ALIGN;
>> >
>> > 	if (info->flag_refiltering)
>> > 		return copy_1st_bitmap_from_memory();
>> >@@ -4429,7 +4544,7 @@ create_1st_bitmap(void)
>> > 	/*
>> > 	 * At first, clear all the bits on the 1st-bitmap.
>> > 	 */
>> >-	memset(buf, 0, sizeof(buf));
>> >+	memset(buf, 0, blocksize);
>> >
>> > 	if (lseek(info->bitmap1->fd, info->bitmap1->offset, SEEK_SET) < 0) {
>> > 		ERRMSG("Can't seek the bitmap(%s). %s\n",
>> >@@ -4975,9 +5090,17 @@ int
>> > copy_bitmap(void)
>> > {
>> > 	off_t offset;
>> >-	unsigned char buf[info->page_size];
>> >+	unsigned char *buf;
>> >+	unsigned char *cp;
>> >  	const off_t failed = (off_t)-1;
>> >
>> >+	if ((cp = malloc(blocksize + DIRECT_ALIGN)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for the bitmap buffer. %s\n",
>> >+		    strerror(errno));
>> >+		exit(1);
>> >+	}
>> >+	buf = cp - ((unsigned long)cp % DIRECT_ALIGN) + DIRECT_ALIGN;
>> >+
>> > 	offset = 0;
>> > 	while (offset < (info->len_bitmap / 2)) {
>> > 		if (lseek(info->bitmap1->fd, info->bitmap1->offset + offset,
>> >@@ -4986,7 +5109,7 @@ copy_bitmap(void)
>> > 			    info->name_bitmap, strerror(errno));
>> > 			return FALSE;
>> > 		}
>> >-		if (read(info->bitmap1->fd, buf, sizeof(buf)) != sizeof(buf)) {
>> >+		if (read(info->bitmap1->fd, buf, blocksize) != blocksize) {
>> > 			ERRMSG("Can't read the dump memory(%s). %s\n",
>> > 			    info->name_memory, strerror(errno));
>> > 			return FALSE;
>> >@@ -4997,12 +5120,12 @@ copy_bitmap(void)
>> > 			    info->name_bitmap, strerror(errno));
>> > 			return FALSE;
>> > 		}
>> >-		if (write(info->bitmap2->fd, buf, sizeof(buf)) != sizeof(buf)) {
>> >+		if (write(info->bitmap2->fd, buf, blocksize) != blocksize) {
>> > 			ERRMSG("Can't write the bitmap(%s). %s\n",
>> > 		    	info->name_bitmap, strerror(errno));
>> > 			return FALSE;
>> > 		}
>> >-		offset += sizeof(buf);
>> >+		offset += blocksize;
>> > 	}
>> >
>> > 	return TRUE;
>> >@@ -5160,6 +5283,8 @@ void
>> > free_bitmap1_buffer(void)
>> > {
>> > 	if (info->bitmap1) {
>> >+		if (info->bitmap1->buf_malloced)
>> >+			free(info->bitmap1->buf_malloced);
>> > 		free(info->bitmap1);
>> > 		info->bitmap1 = NULL;
>> > 	}
>> >@@ -5169,6 +5294,8 @@ void
>> > free_bitmap2_buffer(void)
>> > {
>> > 	if (info->bitmap2) {
>> >+		if (info->bitmap2->buf_malloced)
>> >+			free(info->bitmap2->buf_malloced);
>> > 		free(info->bitmap2);
>> > 		info->bitmap2 = NULL;
>> > 	}
>> >@@ -5287,25 +5414,31 @@ get_loads_dumpfile(void)
>> > int
>> > prepare_cache_data(struct cache_data *cd)
>> > {
>> >+	char *cp;
>> >+
>> > 	cd->fd         = info->fd_dumpfile;
>> > 	cd->file_name  = info->name_dumpfile;
>> > 	cd->cache_size = info->page_size << info->block_order;
>> > 	cd->buf_size   = 0;
>> > 	cd->buf        = NULL;
>> >
>> >-	if ((cd->buf = malloc(cd->cache_size + info->page_size)) == NULL) {
>> >+	if ((cp = malloc(cd->cache_size + info->page_size + DIRECT_ALIGN)) == NULL) {
>> > 		ERRMSG("Can't allocate memory for the data buffer. %s\n",
>> > 		    strerror(errno));
>> > 		return FALSE;
>> > 	}
>> >+	cd->buf_malloced = cp;
>> >+	cd->buf = cp - ((unsigned long)cp % DIRECT_ALIGN) + DIRECT_ALIGN;
>> > 	return TRUE;
>> > }
>> >
>> > void
>> > free_cache_data(struct cache_data *cd)
>> > {
>> >-	free(cd->buf);
>> >+	if (cd->buf_malloced)
>> >+		free(cd->buf_malloced);
>> > 	cd->buf = NULL;
>> >+	cd->buf_malloced = NULL;
>> > }
>> >
>> > int
>> >@@ -5554,19 +5687,21 @@ out:
>> > }
>> >
>> > int
>> >-write_kdump_header(void)
>> >+write_kdump_header(struct cache_data *cd)
>> > {
>> > 	int ret = FALSE;
>> > 	size_t size;
>> > 	off_t offset_note, offset_vmcoreinfo;
>> >-	unsigned long size_note, size_vmcoreinfo;
>> >+	unsigned long size_note, size_vmcoreinfo, remaining_size_note;
>> >+	unsigned long write_size, room;
>> > 	struct disk_dump_header *dh = info->dump_header;
>> > 	struct kdump_sub_header kh;
>> >-	char *buf = NULL;
>> >+	char *buf = NULL, *cp;
>> >
>> > 	if (info->flag_elf_dumpfile)
>> > 		return FALSE;
>> >
>> >+	/* uses reads of /proc/vmcore */
>> > 	get_pt_note(&offset_note, &size_note);
>> >
>> > 	/*
>> >@@ -5583,6 +5718,7 @@ write_kdump_header(void)
>> > 	dh->bitmap_blocks  = divideup(info->len_bitmap, dh->block_size);
>> > 	memcpy(&dh->timestamp, &info->timestamp, sizeof(dh->timestamp));
>> > 	memcpy(&dh->utsname, &info->system_utsname, sizeof(dh->utsname));
>> >+	blocksize = dh->block_size;
>> > 	if (info->flag_compress & DUMP_DH_COMPRESSED_ZLIB)
>> > 		dh->status |= DUMP_DH_COMPRESSED_ZLIB;
>> > #ifdef USELZO
>> >@@ -5595,7 +5731,7 @@ write_kdump_header(void)
>> > #endif
>> >
>> > 	size = sizeof(struct disk_dump_header);
>> >-	if (!write_buffer(info->fd_dumpfile, 0, dh, size, info->name_dumpfile))
>> >+	if (!write_cache(cd, dh, size))
>> > 		return FALSE;
>> >
>> > 	/*
>> >@@ -5651,9 +5787,21 @@ write_kdump_header(void)
>> > 				goto out;
>> > 		}
>> >
>> >-		if (!write_buffer(info->fd_dumpfile, kh.offset_note, buf,
>> >-		    kh.size_note, info->name_dumpfile))
>> >-			goto out;
>> >+		/* the note may be huge, so do this in a loop to not
>> >+		   overflow the cache */
>> >+		remaining_size_note = kh.size_note;
>> >+		cp = buf;
>> >+		do {
>> >+			room = cd->cache_size - cd->buf_size;
>> >+			if (remaining_size_note > room)
>> >+				write_size = room;
>> >+			else
>> >+				write_size = remaining_size_note;
>> >+			if (!write_cache(cd, cp, write_size))
>> >+				goto out;
>> >+			remaining_size_note -= write_size;
>> >+			cp += write_size;
>> >+		} while (remaining_size_note);
>> >
>> > 		if (has_vmcoreinfo()) {
>> > 			get_vmcoreinfo(&offset_vmcoreinfo, &size_vmcoreinfo);
>> >@@ -5669,8 +5817,7 @@ write_kdump_header(void)
>> > 			kh.size_vmcoreinfo = size_vmcoreinfo;
>> > 		}
>> > 	}
>> >-	if (!write_buffer(info->fd_dumpfile, dh->block_size, &kh,
>> >-	    size, info->name_dumpfile))
>> >+	if (!write_cache(cd, &kh, size))
>> > 		goto out;
>> >
>> > 	info->sub_header = kh;
>> >@@ -6267,13 +6414,15 @@ write_elf_pages_cyclic(struct cache_data
>> > }
>> >
>> > int
>> >-write_kdump_pages(struct cache_data *cd_header, struct cache_data *cd_page)
>> >+write_kdump_pages(struct cache_data *cd_descs, struct cache_data *cd_page)
>> > {
>> > 	mdf_pfn_t pfn, per, num_dumpable;
>> > 	mdf_pfn_t start_pfn, end_pfn;
>> > 	unsigned long size_out;
>> >+	long prefix;
>> > 	struct page_desc pd, pd_zero;
>> > 	off_t offset_data = 0;
>> >+	off_t initial_offset_data;
>> > 	struct disk_dump_header *dh = info->dump_header;
>> > 	unsigned char buf[info->page_size], *buf_out = NULL;
>> > 	unsigned long len_buf_out;
>> >@@ -6281,8 +6430,12 @@ write_kdump_pages(struct cache_data *cd_
>> > 	struct timeval tv_start;
>> > 	const off_t failed = (off_t)-1;
>> > 	unsigned long len_buf_out_zlib, len_buf_out_lzo, len_buf_out_snappy;
>> >+	int saved_bytes = 0;
>> >+	int cpysize;
>> >+	char *save_block1, *save_block_cur, *save_block2;
>> >
>> > 	int ret = FALSE;
>> >+	int status;
>> >
>> > 	if (info->flag_elf_dumpfile)
>> > 		return FALSE;
>> >@@ -6324,13 +6477,42 @@ write_kdump_pages(struct cache_data *cd_
>> > 	per = per ? per : 1;
>> >
>> > 	/*
>> >-	 * Calculate the offset of the page data.
>> >+	 * Calculate the offset of the page_desc's and page data.
>> > 	 */
>> >-	cd_header->offset
>> >+	cd_descs->offset
>> > 	    = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
>> > 		* dh->block_size;
>> >-	cd_page->offset = cd_header->offset + sizeof(page_desc_t)*num_dumpable;
>> >-	offset_data  = cd_page->offset;
>> >+
>> >+	/* this is already a pagesize multiple, so well-formed for i/o */
>> >+
>> >+	cd_page->offset = cd_descs->offset + (sizeof(page_desc_t) * num_dumpable);
>> >+	offset_data = cd_page->offset;
>> >+
>> >+	/* for i/o, round this page data offset down to a block boundary */
>> >+	prefix = cd_page->offset % blocksize;
>> >+	cd_page->offset -= prefix;
>> >+	initial_offset_data = cd_page->offset;
>> >+	cd_page->buf_size = prefix;
>> >+	memset(cd_page->buf, 0, prefix);
>> >+
>> >+	fill_to_offset(cd_descs, blocksize);
>> >+
>> >+	if ((save_block1 = malloc(blocksize * 2)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for save block. %s\n",
>> >+		       strerror(errno));
>> >+		goto out;
>> >+	}
>> >+	/* put on block address boundary for well-rounded i/o */
>> >+	save_block1 += (blocksize - (unsigned long)save_block1 % blocksize);
>> >+	save_block_cur = save_block1 + prefix;
>> >+	saved_bytes += prefix;
>> >+	if ((save_block2 = malloc(blocksize + DIRECT_ALIGN)) == NULL) {
>> >+		ERRMSG("Can't allocate memory for save block2. %s\n",
>> >+		       strerror(errno));
>> >+		goto out;
>> >+	}
>> >+	/* put on block address boundary for well-rounded i/o */
>> >+	save_block2 += (DIRECT_ALIGN - (unsigned long)save_block2 % DIRECT_ALIGN);
>> >
>> > 	/*
>> > 	 * Set a fileoffset of Physical Address 0x0.
>> >@@ -6354,6 +6536,14 @@ write_kdump_pages(struct cache_data *cd_
>> > 		memset(buf, 0, pd_zero.size);
>> > 		if (!write_cache(cd_page, buf, pd_zero.size))
>> > 			goto out;
>> >+
>> >+		cpysize = pd_zero.size;
>> >+		if ((saved_bytes + cpysize) > blocksize)
>> >+			cpysize = blocksize - saved_bytes;
>> >+		memcpy(save_block_cur, buf, cpysize);
>> >+		saved_bytes += cpysize;
>> >+		save_block_cur += cpysize;
>> >+
>> > 		offset_data  += pd_zero.size;
>> > 	}
>> > 	if (info->flag_split) {
>> >@@ -6387,7 +6577,7 @@ write_kdump_pages(struct cache_data *cd_
>> > 		 */
>> > 		if ((info->dump_level & DL_EXCLUDE_ZERO)
>> > 		    && is_zero_page(buf, info->page_size)) {
>> >-			if (!write_cache(cd_header, &pd_zero, sizeof(page_desc_t)))
>> >+			if (!write_cache(cd_descs, &pd_zero, sizeof(page_desc_t)))
>> > 				goto out;
>> > 			pfn_zero++;
>> > 			continue;
>> >@@ -6435,25 +6625,68 @@ write_kdump_pages(struct cache_data *cd_
>> > 		/*
>> > 		 * Write the page header.
>> > 		 */
>> >-		if (!write_cache(cd_header, &pd, sizeof(page_desc_t)))
>> >+		if (!write_cache(cd_descs, &pd, sizeof(page_desc_t)))
>> > 			goto out;
>> >
>> > 		/*
>> > 		 * Write the page data.
>> > 		 */
>> >+		/* kludge: save the partial block where page desc's and data overlap */
>> >+		/* (this is the second part of the full block (save_block) where
>> >+		    they overlap) */
>> >+		if (saved_bytes < blocksize) {
>> >+			memcpy(save_block_cur, buf, pd.size);
>> >+			saved_bytes += pd.size;
>> >+			save_block_cur += pd.size;
>> >+		}
>> > 		if (!write_cache(cd_page, pd.flags ? buf_out : buf, pd.size))
>> > 			goto out;
>> > 	}
>> >
>> > 	/*
>> >-	 * Write the remainder.
>> >+	 * Write the remainder (well-formed blocks)
>> > 	 */
>> >-	if (!write_cache_bufsz(cd_page))
>> >-		goto out;
>> >-	if (!write_cache_bufsz(cd_header))
>> >+	/* adjust the cd_descs to write out only full blocks beyond the
>> >+	   data in the buffer */
>> >+	if (cd_descs->buf_size % blocksize) {
>> >+		cd_descs->buf_size +=
>> >+			(blocksize - (cd_descs->buf_size % blocksize));
>> >+		cd_descs->cache_size = cd_descs->buf_size;
>> >+	}
>> >+	if (!write_cache_flush(cd_descs))
>> > 		goto out;
>> >
>> > 	/*
>> >+	 * kludge: the page data will overwrite the last block of the page_desc's,
>> >+	 * so re-construct a block from:
>> >+	 *   the last block of the page_desc's (length 'prefix') (will read into
>> >+	 *   save_block2) and the end (4096-prefix) of the page data we saved in
>> >+	 *   save_block1.
>> >+	 */
>> >+	if (!write_cache_flush(cd_page))
>> >+ 		goto out;
>> >+
>> >+	if (lseek(cd_page->fd, initial_offset_data, SEEK_SET) == failed) {
>> >+		printf("kludge: seek to %#lx, fd %d failed errno %d\n",
>> >+			initial_offset_data, cd_page->fd, errno);
>> >+		exit(1);
>> >+	}
>> >+	if (read(cd_page->fd, save_block2, blocksize) != blocksize) {
>> >+		printf("kludge: read block2 failed\n");
>> >+		exit(1);
>> >+	}
>> >+	/* combine the overlapping parts into save_block1 */
>> >+	memcpy(save_block1, save_block2, prefix);
>> >+
>> >+	if (lseek(cd_page->fd, initial_offset_data, SEEK_SET) == failed) {
>> >+		printf("kludge: seek to %#lx, fd %d failed errno %d\n",
>> >+			initial_offset_data, cd_page->fd, errno);
>> >+		exit(1);
>> >+	}
>> >+	status = write(cd_page->fd, save_block1, blocksize);
>> >+	/* end of kludged block */
>> >+
>> >+	/*
>> > 	 * print [100 %]
>> > 	 */
>> > 	print_progress(PROGRESS_COPY, num_dumpable, num_dumpable);
>> >@@ -6462,8 +6695,6 @@ write_kdump_pages(struct cache_data *cd_
>> >
>> > 	ret = TRUE;
>> > out:
>> >-	if (buf_out != NULL)
>> >-		free(buf_out);
>> > #ifdef USELZO
>> > 	if (wrkmem != NULL)
>> > 		free(wrkmem);
>> >@@ -6863,51 +7094,47 @@ write_kdump_eraseinfo(struct cache_data
>> > }
>> >
>> > int
>> >-write_kdump_bitmap(void)
>> >+write_kdump_bitmap(struct cache_data *cd)
>> > {
>> > 	struct cache_data bm;
>> > 	long long buf_size;
>> >-	off_t offset;
>> >+	long write_size;
>> >
>> > 	int ret = FALSE;
>> >
>> > 	if (info->flag_elf_dumpfile)
>> > 		return FALSE;
>> >
>> >+	/* set up to read bit map file in big blocks from the start */
>> > 	bm.fd        = info->fd_bitmap;
>> > 	bm.file_name = info->name_bitmap;
>> > 	bm.offset    = 0;
>> > 	bm.buf       = NULL;
>> >-
>> >-	if ((bm.buf = calloc(1, BUFSIZE_BITMAP)) == NULL) {
>> >-		ERRMSG("Can't allocate memory for dump bitmap buffer. %s\n",
>> >-		    strerror(errno));
>> >-		goto out;
>> >+	bm.cache_size = cd->cache_size;
>> >+	bm.buf = cd->buf; /* use the bitmap cd */
>> >+	/* using the dumpfile cd_bitmap buffer and fd */
>> >+	if (lseek(cd->fd, info->offset_bitmap1, SEEK_SET) < 0) {
>> >+		ERRMSG("Can't seek the dump file(%s). %s\n",
>> >+		       info->name_memory, strerror(errno));
>> >+		return FALSE;
>> > 	}
>> >-	offset = info->offset_bitmap1;
>> > 	buf_size = info->len_bitmap;
>> >-
>> > 	while (buf_size > 0) {
>> >-		if (buf_size >= BUFSIZE_BITMAP)
>> >-			bm.cache_size = BUFSIZE_BITMAP;
>> >-		else
>> >-			bm.cache_size = buf_size;
>> >-
>> > 		if(!read_cache(&bm))
>> > 			goto out;
>> >-
>> >-		if (!write_buffer(info->fd_dumpfile, offset,
>> >-		    bm.buf, bm.cache_size, info->name_dumpfile))
>> >-			goto out;
>> >-
>> >-		offset += bm.cache_size;
>> >-		buf_size -= BUFSIZE_BITMAP;
>> >+		write_size = cd->cache_size;
>> >+		if (buf_size < cd->cache_size) {
>> >+			write_size = buf_size;
>> >+		}
>> >+		if (write(cd->fd, cd->buf, write_size) != write_size) {
>> >+			ERRMSG("Can't write a destination file. %s\n",
>> >+				strerror(errno));
>> >+			exit(1);
>> >+		}
>> >+		buf_size -= bm.cache_size;
>> > 	}
>> > 	ret = TRUE;
>> > out:
>> >-	if (bm.buf != NULL)
>> >-		free(bm.buf);
>> >-
>> > 	return ret;
>> > }
>> >
>> >@@ -7992,7 +8219,7 @@ int
>> > writeout_dumpfile(void)
>> > {
>> > 	int ret = FALSE;
>> >-	struct cache_data cd_header, cd_page;
>> >+	struct cache_data cd_header, cd_page_descs, cd_page, cd_bitmap;
>> >
>> > 	info->flag_nospace = FALSE;
>> >
>> >@@ -8005,11 +8232,20 @@ writeout_dumpfile(void)
>> > 	}
>> > 	if (!prepare_cache_data(&cd_header))
>> > 		return FALSE;
>> >+	cd_header.offset = 0;
>> >
>> > 	if (!prepare_cache_data(&cd_page)) {
>> > 		free_cache_data(&cd_header);
>> > 		return FALSE;
>> > 	}
>> >+	if (!prepare_cache_data(&cd_page_descs)) {
>> >+		free_cache_data(&cd_header);
>> >+		free_cache_data(&cd_page);
>> >+		return FALSE;
>> >+	}
>> >+	if (!prepare_cache_data(&cd_bitmap))
>> >+		return FALSE;
>> >+
>> > 	if (info->flag_elf_dumpfile) {
>> > 		if (!write_elf_header(&cd_header))
>> > 			goto out;
>> >@@ -8023,22 +8259,36 @@ writeout_dumpfile(void)
>> > 		if (!write_elf_eraseinfo(&cd_header))
>> > 			goto out;
>> > 	} else if (info->flag_cyclic) {
>> >-		if (!write_kdump_header())
>> >+		if (!write_kdump_header(&cd_header))
>> > 			goto out;
>> > 		if (!write_kdump_pages_and_bitmap_cyclic(&cd_header, &cd_page))
>> > 			goto out;
>> > 		if (!write_kdump_eraseinfo(&cd_page))
>> > 			goto out;
>> > 	} else {
>> >-		if (!write_kdump_header())
>> >-			goto out;
>> >-		if (!write_kdump_pages(&cd_header, &cd_page))
>> >-			goto out;
>> >-		if (!write_kdump_eraseinfo(&cd_page))
>> >-			goto out;
>> >-		if (!write_kdump_bitmap())
>> >-			goto out;
>> >-	}
>> >+		/*
>> >+		 * Use cd_header for the caching operation up to the bit map.
>> >+		 * Use cd_bitmap for 1-block (4096) operations on the bit map.
>> >+		 * (it fits between the file header and page_desc's, both of
>> >+		 *  which end and start on block boundaries)
>> >+		 * Then use cd_page_descs and cd_page for page headers and
>> >+		 * data (and eraseinfo).
>> >+		 * Then back to cd_header to fill in the bitmap.
>> >+		 */
>> >+
>> >+		if (!write_kdump_header(&cd_header))
>> >+			goto out;
>> >+		write_cache_flush(&cd_header);
>> >+
>> >+		if (!write_kdump_pages(&cd_page_descs, &cd_page))
>> >+ 			goto out;
>> >+ 		if (!write_kdump_eraseinfo(&cd_page))
>> >+ 			goto out;
>> >+
>> >+		cd_bitmap.offset = info->offset_bitmap1;
>> >+		if (!write_kdump_bitmap(&cd_bitmap))
>> >+ 			goto out;
>> >+ 	}
>> > 	if (info->flag_flatten) {
>> > 		if (!write_end_flat_header())
>> > 			goto out;
>> >@@ -8198,11 +8448,17 @@ create_dumpfile(void)
>> > 		if (!get_elf_info(info->fd_memory, info->name_memory))
>> > 			return FALSE;
>> > 	}
>> >+	blocksize = info->page_size;
>> >+	if (!blocksize)
>> >+		blocksize = sysconf(_SC_PAGE_SIZE);
>> > 	if (!initial())
>> > 		return FALSE;
>> >
>> > 	print_vtop();
>> >
>> >+	if (jflag)
>> >+		PROGRESS_MSG("Using O_DIRECT i/o for dump and bitmap.\n");
>> >+
>> > 	num_retry = 0;
>> > retry:
>> > 	if (info->flag_refiltering) {
>> >@@ -9285,7 +9541,6 @@ int show_mem_usage(void)
>> > 		return FALSE;
>> > 	}
>> >
>> >-
>> > 	if (!info->flag_cyclic)
>> > 		info->flag_cyclic = TRUE;
>> >
>> >@@ -9379,7 +9634,7 @@ main(int argc, char *argv[])
>> >
>> > 	info->block_order = DEFAULT_ORDER;
>> > 	message_level = DEFAULT_MSG_LEVEL;
>> >-	while ((opt = getopt_long(argc, argv, "b:cDd:EFfg:hi:lpRvXx:", longopts,
>> >+	while ((opt = getopt_long(argc, argv, "b:cDd:EFfg:hi:jlpRvXx:", longopts,
>> > 	    NULL)) != -1) {
>> > 		switch (opt) {
>> > 		case OPT_BLOCK_ORDER:
>> >@@ -9423,6 +9678,10 @@ main(int argc, char *argv[])
>> > 			info->flag_read_vmcoreinfo = 1;
>> > 			info->name_vmcoreinfo = optarg;
>> > 			break;
>> >+		case 'j':
>> >+			jflag = 1;
>> >+			info->flag_cyclic = FALSE; // saving memory to avoid cyclic
>> >+			break;
>> > 		case OPT_DISKSET:
>> > 			if (!sadump_add_diskset_info(optarg))
>> > 				goto out;
>> >
>> >_______________________________________________
>> >kexec mailing list
>> >kexec at lists.infradead.org
>> >http://lists.infradead.org/mailman/listinfo/kexec
>
>--
>Cliff Wickman
>SGI
>cpw at sgi.com
>(651)683-7524 vnet 207524
>(651)482-9347 home



More information about the kexec mailing list