[PATCH] makedumpfile: keep dumpfile pages in a cache

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Wed Feb 6 02:01:08 EST 2013


Hello Petr,

On Thu, 10 Jan 2013 09:48:51 +0900
Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> wrote:

> Hello Petr,
> 
> On Wed, 19 Dec 2012 16:01:25 +0100
> Petr Tesarik <ptesarik at suse.cz> wrote:
> 
> > V Mon, 19 Nov 2012 17:40:44 +0900
> > Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> napsáno:
> > 
> > > Hello Petr,
> > > 
> > > On Wed, 14 Nov 2012 15:42:12 +0100
> > > Petr Tesarik <ptesarik at suse.cz> wrote:
> > >  
> > > > > Sorry for the late reply.
> > > > > According to your measurement, it looks good on performance.
> > > > > 
> > > > > However, I found the issue below in v1.5.1-beta and made sure
> > > > > that this patch causes it by git bisect (but I don't find the
> > > > > true cause yet).
> > > > > 
> > > > >   result on kernel 3.4:
> > > > >     $ makedumpfile --non-cyclic vmcore dumpfile
> > > > >     Copying data                       : [ 62 %]
> > > > >     readpage_elf: Can't convert a physical address(a0000) to \
> > > > > offset. readmem: type_addr: 1, addr:1000a0000, size:4096
> > > > >     read_pfn: Can't get the page data.
> > > > > 
> > > > >     makedumpfile Failed.
> > > > >     $
> > > > > 
> > > > > It seems critical issue for all users, so I will postpone merging
> > > > > this patch until this issue is solved.

I found the cause of this issue.

In the log above, readmem() try to read 0x1000a0000 (and it's correct),
but readpage_elf() try to read 0xa0000.
This is because your code uses PAGEBASE macro before readpage_elf().

  #define PAGEBASE(X)             (((unsigned long)(X)) & ~(PAGESIZE() - 1))
 
In 32bit systems, sizeof(unsigned long) is 32, 0x1000a0000 is truncated
to 0xa0000 and readpage_elf() gets it.

It's PAGEBASE macro's issue, there is no problem in your code. 
So, I'll merge your patch just as it is, and merge the patch below.


Thanks
Atsushi Kumagai

------------------------------------------------------------------
[PATCH] Fix PAGEOFFSET and PAGEBASE macros for i386 PAE.

i386 PAE system has physical address of 36bit, but PAGEOFFSET and
PAGEBASE cast the argument to "unsigned long (32bit)".
As a result, they return invalid address in i386 PAE system.

Signed-off-by: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>

---
 makedumpfile.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/makedumpfile.h b/makedumpfile.h
index 98cd528..6026fa2 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -151,8 +151,8 @@ isAnon(unsigned long mapping)
 
 #define PAGESIZE()		(info->page_size)
 #define PAGESHIFT()		(info->page_shift)
-#define PAGEOFFSET(X)		(((unsigned long)(X)) & (PAGESIZE() - 1))
-#define PAGEBASE(X)		(((unsigned long)(X)) & ~(PAGESIZE() - 1))
+#define PAGEOFFSET(X)		(((unsigned long long)(X)) & (PAGESIZE() - 1))
+#define PAGEBASE(X)		(((unsigned long long)(X)) & ~(PAGESIZE() - 1))
 
 /*
  * for SPARSEMEM
-- 
1.8.0.2



> > > > 
> > > > Understood. However, I haven't run into this situation, but I'd
> > > > like to help.
> > > 
> > > Thanks in advance.
> > > 
> > > > Which architecture is this?
> > > > Could you possibly share the vmcore file with me?
> > > 
> > > I tested on i386 with kernel-3.4.8.
> > > 
> > > I have no way to send the vmcore to you, so I attach the .config file
> > > instead of it.
> > 
> > I have finally compiled and installed the kernel. I was able to save an
> > ELF dump file. However, makedumpfile fails like this:
> > 
> > ptesarik at nathan:~/makedumpfile> ./makedumpfile --non-cyclic vmcore
> > dumpfile
> > __read_disk_dump_header: Can't seek a file(vmcore). Invalid argument
> > read_device: Can't seek a file(vmcore). Invalid argument
> > check_elf_format: Can't seek vmcore. Invalid argument
> > 
> > makedumpfile Failed.
> > 
> > Any ideas?
> 
> Hmmm...the errors are shown with "lseek(fd, 0x0, SEEK_SET)" and vmcore
> might be broken for some reason, but I'm not sure...
> I assume that the vmcore can't be opened with crash either, right ?
> 
> Anyway, it seems difficult to reproduce this issue in your environment.
> So, I take on the investigation for this issue, please give me more time.
> 
> 
> Thanks
> Atsushi Kumagai



More information about the kexec mailing list