[PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore
cpw at sgi.com
Thu Apr 25 09:38:25 EDT 2013
On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and IO work.
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance.
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimtanal kernel-side memory filtering.
> - makedumpfile mmap() benchmark, by Jingbai Ma
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
> v3 => v4)
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 1st kernel just as in v1. Allocate
> vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
> end of the region doesn't fit inside page boundary, no longer copy
> them as the previous v3. Then, holes, outside OS memory, are visible
> from /proc/vmcore. => See [PATCH 7/8]
> v2 => v3)
> - Rebase 3.9-rc3.
> - Copy program headers seprately from e_phoff in ELF note segment
> buffer. Now there's no risk to allocate huge memory if program
> header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
> runtime. Instead, use the varibale that has original buffer size.
> v1 => v2)
> - Clean up the existing codes: use e_phoff, and remove the assumption
> on PT_NOTE entries.
> - Fix potencial bug that ELF haeader size is not included in exported
> vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
> code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
> instead of copying them into the buffer on the 2nd kernel.
> This patch set is composed based on v3.9-rc7.
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
> HATAYAMA Daisuke (8):
> vmcore: support mmap() on /proc/vmcore
> vmcore: treat memory chunks referenced by PT_LOAD program header entries in \
> page-size boundary in vmcore_list
> vmcore: count holes generated by round-up operation for page boudary for size \
> of /proc/vmcore
> vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects
> vmcore: Add helper function vmcore_add()
> vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \
> copied in 2nd kernel vmcore: clean up read_vmcore()
> vmcore: allocate buffer for ELF headers on page-size alignment
> fs/proc/vmcore.c | 349 ++++++++++++++++++++++++++++++++---------------
> include/linux/proc_fs.h | 8 +
> 2 files changed, 245 insertions(+), 112 deletions(-)
> HATAYAMA, Daisuke
This is a very important patch set for speeding the kdump process.
(patches 1 - 8)
We have found the mmap interface to /proc/vmcore about 80x faster than the
That is, doing mmap's and copying data (in pieces the size of page
structures) transfers all of /proc/vmcore about 80 times faster than
This greatly speeds up the capture of a kdump, as the scan of page
structures takes the bulk of the time in dumping the OS on a machine
with terabytes of memory.
We would very much like to see this set make it into the 3.10 release.
Acked-by: Cliff Wickman <cpw at sgi.com>
cpw at sgi.com
More information about the kexec