[RFC] makedumpfile: Improve reading speed with mmap()

Sat Mar 9 01:08:05 EST 2013

From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
Subject: Re: [RFC] makedumpfile: Improve reading speed with mmap()
Date: Fri, 8 Mar 2013 11:33:32 +0900

> Hello HATAYAMA-san,
> 
> On Fri, 08 Mar 2013 10:45:18 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
> 
>> From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
>> Subject: Re: [RFC] makedumpfile: Improve reading speed with mmap()
>> Date: Wed, 6 Mar 2013 18:13:50 +0900
>> 
>> > From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
>> > Subject: [RFC] makedumpfile: Improve reading speed with mmap()
>> > Date: Wed, 6 Mar 2013 15:48:04 +0900
>> > 
>> >> Hello,
>> >> 
>> >> I made the prototype patch to use mmap() on /proc/vmcore for
>> >> benchmarking.
>> >> 
>> >> This patch simply replaces read(2) with mmap(2), I think we can see
>> >> the pure performance improvement by reducing the number of map/unmap.
>> >> 
>> >>   - When /proc/vmcore supports mmap(), readmem() calls read_with_mmap() 
>> >>     to read /proc/vmcore with mmap() instead of read().
>> >> 
>> >>   - Introduce --map-size <Kbyte> option to specify the map size.
>> >>     This option is necessary to use mmap() in this patch, but just for
>> >>     benchmarking. I'll remove this option in release version and change
>> >>     the map size into suitable constant size to get enough performance
>> >>     improvement.
>> >> 
>> >>   - This patch is based on devel branch:
>> >>     http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/devel
>> >> 
>> >> Unfortunately, I haven't done test and benchmarking in 2nd kernel yet
>> >> because I can't start up newer kernel as 2nd kernel on my machine.
>> >> (It seems just my environment issue.) 
>> >> 
>> >> At least, this patch works for vmcores saved on local disk,
>> >> so it will work in 2nd kernel too.
>> >> 
>> >> If anyone helps to do benchmarking, it's very helpful for me.
>> >> And any comments for this patch are welcome.
>> > 
>> > Kumagai-san,
>> > 
>> > I think it necessary to compare this generic one with the idea
>> > considering virtual memory mapping, which should affect filtering
>> > performance to some degree.
>> > 
>> > http://lists.infradead.org/pipermail/kexec/2013-February/007982.html
>> > 
>> > I guess implementation can relatively be moduler. I'll post a
>> > prorotype patch for benchmark later.
>> 
>> Sorry, I investigated this around again and now I think this generic
>> one is enough if size of mmap() range is large enough more than 2MB
>> that is page size used for mapping virtual memory mapping.
>> 
>> So, let's benchmark this version.
>> 
>> BTW, I think it useful to prepare a temporary branch for this
>> benchmark for people who help benchmark. It's awkward to manage
>> patches manually.
> 
> Yes, I thought that I should prepare such a branch when the patch for
> benchmark is fixed, and now is the time.
> 
>> 
>> Also, I posted the following patch yesterday. The v2 patch for mmap()
>> on /proc/vmcore needs this since new note type is added in
>> "VMCOREINFO" name.
>> 
>> [PATCH 0/3] makedumpfile, elf: distinguish ELF note types by ELF note names
>> http://lists.infradead.org/pipermail/kexec/2013-March/008136.html
> 
> Now, I can't get the chance to review the patch set above.
> But, anyway, I created the branch "mmap":
> 
> http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/mmap
> 
> Please use it for benchmark.

Thanks! It's very helpful.

Also, I tested a little the mmap branch code and found a small bug
that max file offset used in calculating mmap()'s position is
wrong. Please see the next patch.

# But sorry, I made this quickly so I didn't consider design enough.

>From 77ef0e836bba4713bfb578949d2785962179d630 Mon Sep 17 00:00:00 2001
From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
Date: Sat, 9 Mar 2013 13:49:43 +0900
Subject: [PATCH] makedumpfile: fix max offset relative to file

To see file offset of each memory chunk, it's correct to read p_offset
in the corresponing PT_LOAD entrie.

On /proc/vmcore PT_LOAD entries are sorted on p_load values in
increasing order. So, it's sufficient to refer to the last PT_LOAD
entry only. But the code here doesn't assuming that, calculating
maximum one from all the PT_LOAD entries.

Signed-off-by: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
---
 elf_info.c     | 12 ++++++++++++
 elf_info.h     |  2 ++
 makedumpfile.c |  2 +-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/elf_info.c b/elf_info.c
index 9bd8cd0..aa8cacb 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -45,6 +45,7 @@ struct pt_load_segment {
 };
 
 static int			nr_cpus;             /* number of cpu */
+static off_t			max_file_offset;
 
 /*
  * File information about /proc/vmcore:
@@ -637,6 +638,12 @@ get_elf_info(int fd, char *filename)
 			return FALSE;
 		j++;
 	}
+	max_file_offset = 0;
+	for (i = 0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		max_file_offset=MAX(max_file_offset,
+				    p->file_offset+p->phys_end-p->phys_start);
+	}
 	if (!has_pt_note()) {
 		ERRMSG("Can't find PT_NOTE Phdr.\n");
 		return FALSE;
@@ -869,3 +876,8 @@ set_eraseinfo(off_t offset, unsigned long size)
 	size_eraseinfo   = size;
 }
 
+off_t
+get_max_file_offset(void)
+{
+	return max_file_offset;
+}
diff --git a/elf_info.h b/elf_info.h
index eb58023..61ab3c9 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -71,6 +71,8 @@ int has_eraseinfo(void);
 void get_eraseinfo(off_t *offset, unsigned long *size);
 void set_eraseinfo(off_t offset, unsigned long size);
 
+off_t get_max_file_offset(void);
+
 #endif  /* ELF_INFO_H */
 
 
diff --git a/makedumpfile.c b/makedumpfile.c
index 3351158..7acbf72 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -235,7 +235,7 @@ static int
 update_mmap_range(off_t offset) {
 	off_t start_offset;
 	off_t map_size;
-	off_t max_offset = info->max_mapnr * info->page_size;
+	off_t max_offset = get_max_file_offset();
 
 	munmap(info->mmap_buf,
 	       info->mmap_end_offset - info->mmap_start_offset);
-- 
1.8.1.4