crash by normal: crashdump without reserving memory during system boot
ying.huang at intel.com
Tue Oct 9 09:28:18 EDT 2007
On Mon, 2007-10-01 at 14:10 +0530, Vivek Goyal wrote:
On Wed, Sep 26, 2007 at 03:34:10PM +0800, Huang, Ying wrote:
> > Hi,
> > I have a proposal to do crashdump without reserving memory during
> > boot. The method is as follow:
> > 1. Do not reserve memory during system boot, that is
> > crashkernel=<XX>@<YY> is not used in kernel command line.
> > 2. A new kexec flag named KEXEC_CRASH_BY_NORMAL is defined for
> > sys_kexec_load system call. When this flag is specified, the
> > sys_kexec_load works as normal kexec (not crash kexec), except the
> > destination image is kexec_crash_image instead of kexec_image.
> > 3. In kexec-tools (/sbin/kexec), --mem-min=<addr1> and
> > is used to specify the memory area used by crashdump kernel. That
> > the image, elf core header, available memory of crashdump kernel is
> > within <addr1> ~ <addr2>.
> Probably this can be an optional thing. Anyway if destination pages
> going to be backed up in source pages, a user does not have to specify
> --mem-min and --mem-max.
The --mem-min and --mem-max is used to specify the destination memory
range. I think they are necessary. One source page corresponds to one
destination page (except some source page allocated at the same position
of corresponding destination page). The --mem-min and --mem-max has
similar function as crashkernel=YM at XM in kernel parameters.
> 4. In kexec-tools, in addition to kernel image, elf core header, etc
> > loaded, the available memory of crashdump kernel is loaded too. For
> > example, the segments for sys_kexec_load for crashdump kernel can
> > --mem-min=0x100000
> > --mem-max=0xffffff
> > No. buf bufsz mem memsz
> > 0 NULL 0 0x1000 0x9e000
> > 1 0x881fe88 0x289b 0x100000 0x3000
> > 2 NULL 0 0x103000 0xfd000
> > 3 0xb7bfa808 0xb7c00 0x200000 0xb8000
> > 4 NULL 0 0x2b8000 0xd39000
> > 5 0x8818d38 0x7120 0xff1000 0x9000
> > 6 NULL 0 0xffa000 0x1000
> > 7 0x8818268 0x400 0xffb000 0x4000
> > 8 NULL 0 0xfff000 0x1000
> May be user also need to specify how much memory to allocate for
> kernel execution.
The memory for second kernel execution is specified through --mem-min
> 5. In relocate_kernel of Linux kernel, instead of copy the source page
> > to destination page, the contents of source page and the destination
> > page are swapped. (The destination page -> source page map is in
> > kexec_crash_image->head) The memory area used by crashdump kernel is
> > backupped to source page.
> Interesting. Just that it introduces more code in crash path.
The source/destination page swap code is very simple and executed after
turning off paging. So I think the added code has no big problem.
> In original crashdump implementation, the crashdump kernel run in
> > reserved memory area. The reserved memory pages are reserved memory
> > pages in primary (original) kernel.
> > In this proposed implementation, the crashdump kernel run in
> > memory area, the contents of destination memory area is backupped
> > crashdump kernel running. The backup pages are allocated memory
> > primary (original) kernel.
> How would you prepare ELF headers for backed up memory. ELF headers
> created in user space and before sys_kexec_load is executed,
> need to know the address of physical memory where the actual data is.
> in this scheme, source pages will be allocated only after
> has been called.
> These source page addresses will have to be exported to user space so
> that kexec tools can fill up ELF headers accordingly.
Now, the memory region used by the second kernel is excluded from the
ELF headers. The map of destination page -> source page can be passed to
the second kernel. So the contents of destination page can be restored
from source page in a user space tool (such as a modified version of
makedumpfile). It is much harder to embed the map of destination page ->
source into ELF headers.
> > The pros and cons of proposed implementation:
> > Pros:
> > - The memory used by crashdump kernel need not to be reserved during
> > boot time.
> > - The memory used by crashdump kernel can be specified during
> > sys_kexec_load
> > - The memory used by crashdump kernel can be freed after unloading.
> > Cons:
> > - The memory used by crashdump kernel can be the DMA destination,
> > contents may be ruined by devices during the boot of crashdump
> > (Is it possible to turn off DMA for some memory area other than
> > reserving it?)
> Potential corruption because of DMA was a big issue and that's why the
> exclusive reserved area and relocatable kernel came into the picture.
> Eric in the past had tried disabling DMA at PCI level, but I think it
> did not work for him.
> - There is no gurantee that one will get sufficient memory allocated
> when needed. so loading kdump kernel might fail.
> - More code in crash path and potentially reduces the relibaility of
> the mechanism.
A possible solution for DMA issue is as follow:
- Specify the memory region used by the second kernel in kernel boot
- Create a zone for this memory region. This zone can not be used for
- Use this memory region for the second kernel.
> > In fact, almost all mechanism for this proposal has been implemented
> > my previous patch: "kexec jump" in "kexec based hibernation".
> > Any comment is welcome.
> Idea is interesting. But at the same time it reduces the reliability
> kdump. I am especially concerned about DMA issue more code in crash
It is less reliable than the original method. But I think if the DMA
issue can be solved, it may be acceptable.
> I will rather try to find out if I can create some mechanisms to do
> contiguous memory area allocation from user space at run time instead
> doing it at boot time.
More information about the kexec