[PATCH] makedumpfile: keep dumpfile pages in a cache
Atsushi Kumagai
kumagai-atsushi at mxc.nes.nec.co.jp
Tue Nov 13 22:47:24 EST 2012
Hello Petr,
On Thu, 6 Sep 2012 17:50:52 +0200
Petr Tesarik <ptesarik at suse.cz> wrote:
> Dne Po 3. září 2012 09:04:03 Petr Tesarik napsal(a):
> > Dne Po 3. září 2012 05:42:33 Atsushi Kumagai napsal(a):
> > > Hello Petr,
> > >
> > > On Tue, 28 Aug 2012 19:49:49 +0200
> > >
> > > Petr Tesarik <ptesarik at suse.cz> wrote:
> > > > Add a simple cache for pages read from the dumpfile.
> > > >
> > > > This is a big win if we read consecutive data from one page, e.g.
> > > > page descriptors, or even page table entries.
> > > >
> > > > Note that makedumpfile now always reads a complete page. This was
> > > > already the case with kdump-compressed and sadump formats, but
> > > > makedumpfile was throwing most of the data away. For the
> > > > kdump-compressed case, we may actually save a lot of decompression,
> > > > too.
> > > >
> > > > I tried to keep the cache small to minimize memory footprint, but it
> > > > should be big enough to hold all pages to do 4-level paging plus some
> > > > data. This is needed e.g. for vmalloc areas or Xen page frame table
> > > > data, which are not contiguous in physical memory.
> > > >
> > > > Signed-off-by: Petr Tesarik <ptesarik at suse.cz>
Sorry for the late reply.
According to your measurement, it looks good on performance.
However, I found the issue below in v1.5.1-beta and made sure that this patch
causes it by git bisect (but I don't find the true cause yet).
result on kernel 3.4:
$ makedumpfile --non-cyclic vmcore dumpfile
Copying data : [ 62 %]
readpage_elf: Can't convert a physical address(a0000) to offset.
readmem: type_addr: 1, addr:1000a0000, size:4096
read_pfn: Can't get the page data.
makedumpfile Failed.
$
It seems critical issue for all users, so I will postpone merging this patch
until this issue is solved.
Thanks
Atsushi Kumagai
> > >
> > > It's interesting to me. I want to know how performance will be improved
> > > with this patch, so do you have speed measurements ?
> >
> > Not really. I only measured the hit/miss ratio, and with filtering Xen domU
> > and dump level 0, I got the following on a small system (2G RAM):
> >
> > cache hit: 1818880 cache miss: 1873
> >
> > The improvement isn't much for non-Xen case, because the hits are mostly
> > due to virtual-to-physical translations, and most of Linux data is stored
> > at virtual addresses that can be resolved by adding/subtracting a fixed
> > offset.
> >
> > Of course, you will also win only the syscall overhead, because Linux keeps
> > the data in the kernel pagecache anyway. I'll measure the times for you on
> > a reasonably large system (~256G) and send the results here.
>
> I couldn't get a medium-sized system for testing, so I performed some
> measurements on a 64G system. I ran makedumpfile repeatedly from the kdump
> environment. First run was used to cache target filesystem metadata, and the
> cache was not dropped between runs to minimize effects of the target
> filesystem. I ran it against /proc/vmcore, i.e. the input file was always
> resident, nothing to skew the results.
>
> I tried with a kdump file with no compression (to get gzip/LZO out of the
> picture) and an ELF file. For the Xen case I only did the ELF file, because
> kdump is not available.
>
> First I ran it on bare metal. There was a slight improvement for -d31:
>
> kdump no cache:
> 6.32user 55.20system 1:15.60elapsed 81%CPU (0avgtext+0avgdata
> 4800maxresident)k
> 2080inputs+5714296outputs (2major+342minor)pagefaults 0swaps
>
> kdump with cache:
> 6.02user 24.58system 0:46.51elapsed 65%CPU (0avgtext+0avgdata
> 4912maxresident)k
> 1864inputs+5714288outputs (2major+350minor)pagefaults 0swaps
>
> ELF no cache:
> 7.58user 74.25system 1:59.52elapsed 68%CPU (0avgtext+0avgdata
> 4800maxresident)k
> 728inputs+9288824outputs (1major+342minor)pagefaults 0swaps
>
> ELF with cache:
> 7.43user 44.21system 1:17.41elapsed 66%CPU (0avgtext+0avgdata
> 4896maxresident)k
> 728inputs+9288792outputs (1major+349minor)pagefaults 0swaps
>
> To sum it up, I can see an improvement of approx. 50% in system time. The
> increase in memory consumption is a bit more than I would expect (why do I see
> ~100k for a cache of 12k?), but acceptable nevertheless. I can see a slight
> increase in user time (approx. 25%) for the kdump case, which could be
> attributed to the cache overhead. I don't have any explanation for the
> decreased user time for the ELF case, but it's consistent.
>
> I also tried running makedumpfile with -d1. This results in long sequential
> reads, so it's the worst case for a simple LRU-policy cache. The results are
> too unstable to make a reliable measurement, but there seems to be a slight
> performance hit. It is certainly less than 5% total time.
>
> I think there are two reasons for that:
>
> 1. We're copying file data twice for each page (once from the kernel page
> cache to the process space, and once from the internal cache to the
> destination).
> 2. Instead of reusing the same data location, we're rotating 8 different pages
> (or even up to twice as much if the allocated space is neither continuous nor
> page-aligned). This stresses both for the CPU's L1 d-cache and the TLB a tiny
> bit more. Note that in the /proc/vmcore case, the kernel sequentially maps all
> physical memory of the crashed system, so every cache page may be evicted
> before we get to using it again. This could explain why I observe an increase
> in system time despite making less system calls.
>
> There's a lot of things I could do to regain the old performance, if anybody
> is concerned about the slight performance regression for this worst case. Just
> let me know.
>
> Second, I ran with the Xen hypervisor. Since dump levels greater than 1 don't
> work, I ran with '-E -X -d1'. Even though this includes the inefficient page
> walk described above, the improvement was immense.
>
> no cache:
> 95.33user 657.18system 13:08.40elapsed 95%CPU (0avgtext+0avgdata
> 5440maxresident)k
> 704inputs+6563856outputs (1major+388minor)pagefaults 0swaps
>
> with cache:
> 61.14user 110.15system 3:24.24elapsed 83%CPU (0avgtext+0avgdata
> 5584maxresident)k
> 2360inputs+6563872outputs (2major+396minor)pagefaults 0swaps
>
> In short, almost 80% shorter total time.
>
> Petr Tesarik
> SUSE Linux
>
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
More information about the kexec
mailing list