[RFC] makedumpfile: Improve reading speed with mmap()
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Sat Mar 9 01:08:05 EST 2013
From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
Subject: Re: [RFC] makedumpfile: Improve reading speed with mmap()
Date: Fri, 8 Mar 2013 11:33:32 +0900
> Hello HATAYAMA-san,
>
> On Fri, 08 Mar 2013 10:45:18 +0900 (JST)
> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
>
>> From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
>> Subject: Re: [RFC] makedumpfile: Improve reading speed with mmap()
>> Date: Wed, 6 Mar 2013 18:13:50 +0900
>>
>> > From: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
>> > Subject: [RFC] makedumpfile: Improve reading speed with mmap()
>> > Date: Wed, 6 Mar 2013 15:48:04 +0900
>> >
>> >> Hello,
>> >>
>> >> I made the prototype patch to use mmap() on /proc/vmcore for
>> >> benchmarking.
>> >>
>> >> This patch simply replaces read(2) with mmap(2), I think we can see
>> >> the pure performance improvement by reducing the number of map/unmap.
>> >>
>> >> - When /proc/vmcore supports mmap(), readmem() calls read_with_mmap()
>> >> to read /proc/vmcore with mmap() instead of read().
>> >>
>> >> - Introduce --map-size <Kbyte> option to specify the map size.
>> >> This option is necessary to use mmap() in this patch, but just for
>> >> benchmarking. I'll remove this option in release version and change
>> >> the map size into suitable constant size to get enough performance
>> >> improvement.
>> >>
>> >> - This patch is based on devel branch:
>> >> http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/devel
>> >>
>> >> Unfortunately, I haven't done test and benchmarking in 2nd kernel yet
>> >> because I can't start up newer kernel as 2nd kernel on my machine.
>> >> (It seems just my environment issue.)
>> >>
>> >> At least, this patch works for vmcores saved on local disk,
>> >> so it will work in 2nd kernel too.
>> >>
>> >> If anyone helps to do benchmarking, it's very helpful for me.
>> >> And any comments for this patch are welcome.
>> >
>> > Kumagai-san,
>> >
>> > I think it necessary to compare this generic one with the idea
>> > considering virtual memory mapping, which should affect filtering
>> > performance to some degree.
>> >
>> > http://lists.infradead.org/pipermail/kexec/2013-February/007982.html
>> >
>> > I guess implementation can relatively be moduler. I'll post a
>> > prorotype patch for benchmark later.
>>
>> Sorry, I investigated this around again and now I think this generic
>> one is enough if size of mmap() range is large enough more than 2MB
>> that is page size used for mapping virtual memory mapping.
>>
>> So, let's benchmark this version.
>>
>> BTW, I think it useful to prepare a temporary branch for this
>> benchmark for people who help benchmark. It's awkward to manage
>> patches manually.
>
> Yes, I thought that I should prepare such a branch when the patch for
> benchmark is fixed, and now is the time.
>
>>
>> Also, I posted the following patch yesterday. The v2 patch for mmap()
>> on /proc/vmcore needs this since new note type is added in
>> "VMCOREINFO" name.
>>
>> [PATCH 0/3] makedumpfile, elf: distinguish ELF note types by ELF note names
>> http://lists.infradead.org/pipermail/kexec/2013-March/008136.html
>
> Now, I can't get the chance to review the patch set above.
> But, anyway, I created the branch "mmap":
>
> http://makedumpfile.git.sourceforge.net/git/gitweb.cgi?p=makedumpfile/makedumpfile;a=shortlog;h=refs/heads/mmap
>
> Please use it for benchmark.
Thanks! It's very helpful.
Also, I tested a little the mmap branch code and found a small bug
that max file offset used in calculating mmap()'s position is
wrong. Please see the next patch.
# But sorry, I made this quickly so I didn't consider design enough.
>From 77ef0e836bba4713bfb578949d2785962179d630 Mon Sep 17 00:00:00 2001
From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
Date: Sat, 9 Mar 2013 13:49:43 +0900
Subject: [PATCH] makedumpfile: fix max offset relative to file
To see file offset of each memory chunk, it's correct to read p_offset
in the corresponing PT_LOAD entrie.
On /proc/vmcore PT_LOAD entries are sorted on p_load values in
increasing order. So, it's sufficient to refer to the last PT_LOAD
entry only. But the code here doesn't assuming that, calculating
maximum one from all the PT_LOAD entries.
Signed-off-by: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
---
elf_info.c | 12 ++++++++++++
elf_info.h | 2 ++
makedumpfile.c | 2 +-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/elf_info.c b/elf_info.c
index 9bd8cd0..aa8cacb 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -45,6 +45,7 @@ struct pt_load_segment {
};
static int nr_cpus; /* number of cpu */
+static off_t max_file_offset;
/*
* File information about /proc/vmcore:
@@ -637,6 +638,12 @@ get_elf_info(int fd, char *filename)
return FALSE;
j++;
}
+ max_file_offset = 0;
+ for (i = 0; i < num_pt_loads; ++i) {
+ struct pt_load_segment *p = &pt_loads[i];
+ max_file_offset=MAX(max_file_offset,
+ p->file_offset+p->phys_end-p->phys_start);
+ }
if (!has_pt_note()) {
ERRMSG("Can't find PT_NOTE Phdr.\n");
return FALSE;
@@ -869,3 +876,8 @@ set_eraseinfo(off_t offset, unsigned long size)
size_eraseinfo = size;
}
+off_t
+get_max_file_offset(void)
+{
+ return max_file_offset;
+}
diff --git a/elf_info.h b/elf_info.h
index eb58023..61ab3c9 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -71,6 +71,8 @@ int has_eraseinfo(void);
void get_eraseinfo(off_t *offset, unsigned long *size);
void set_eraseinfo(off_t offset, unsigned long size);
+off_t get_max_file_offset(void);
+
#endif /* ELF_INFO_H */
diff --git a/makedumpfile.c b/makedumpfile.c
index 3351158..7acbf72 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -235,7 +235,7 @@ static int
update_mmap_range(off_t offset) {
off_t start_offset;
off_t map_size;
- off_t max_offset = info->max_mapnr * info->page_size;
+ off_t max_offset = get_max_file_offset();
munmap(info->mmap_buf,
info->mmap_end_offset - info->mmap_start_offset);
--
1.8.1.4
More information about the kexec
mailing list