[PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore

Cliff Wickman cpw at sgi.com
Thu Apr 25 09:38:25 EDT 2013


On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
> 
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and IO work.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance.
> 
> Benchmark
> =========
> 
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimtanal kernel-side memory filtering.
> 
> - makedumpfile mmap() benchmark, by Jingbai Ma
>   https://lkml.org/lkml/2013/3/27/19
> 
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
>   https://lkml.org/lkml/2013/3/26/914
> 
> ChangeLog
> =========
> 
> v3 => v4)
> 
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 1st kernel just as in v1. Allocate
>   vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
>   end of the region doesn't fit inside page boundary, no longer copy
>   them as the previous v3. Then, holes, outside OS memory, are visible
>   from /proc/vmcore. => See [PATCH 7/8]
> 
> v2 => v3)
> 
> - Rebase 3.9-rc3.
> - Copy program headers seprately from e_phoff in ELF note segment
>   buffer. Now there's no risk to allocate huge memory if program
>   header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
>   runtime. Instead, use the varibale that has original buffer size.
> 
> v1 => v2)
> 
> - Clean up the existing codes: use e_phoff, and remove the assumption
>   on PT_NOTE entries.
> - Fix potencial bug that ELF haeader size is not included in exported
>   vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
>   code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
>   instead of copying them into the buffer on the 2nd kernel.
> 
> Test
> ====
> 
> This patch set is composed based on v3.9-rc7.
> 
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
> 
> ---
> 
> HATAYAMA Daisuke (8):
>       vmcore: support mmap() on /proc/vmcore
>       vmcore: treat memory chunks referenced by PT_LOAD program header entries in \
>                 page-size boundary in vmcore_list
>       vmcore: count holes generated by round-up operation for page boudary for size \
>                 of /proc/vmcore
>       vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects
>       vmcore: Add helper function vmcore_add()
>       vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \
> copied in 2nd kernel  vmcore: clean up read_vmcore()
>       vmcore: allocate buffer for ELF headers on page-size alignment
> 
> 
>  fs/proc/vmcore.c        |  349 ++++++++++++++++++++++++++++++++---------------
>  include/linux/proc_fs.h |    8 +
>  2 files changed, 245 insertions(+), 112 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

This is a very important patch set for speeding the kdump process.
(patches 1 - 8)

We have found the mmap interface to /proc/vmcore about 80x faster than the 
read interface.
That is, doing mmap's and copying data (in pieces the size of page
structures) transfers all of /proc/vmcore about 80 times faster than
reading it.

This greatly speeds up the capture of a kdump, as the scan of page
structures takes the bulk of the time in dumping the OS on a machine
with terabytes of memory.

We would very much like to see this set make it into the 3.10 release.

Acked-by: Cliff Wickman <cpw at sgi.com>

-Cliff
-- 
Cliff Wickman
SGI
cpw at sgi.com
(651) 683-3824



More information about the kexec mailing list