[RFC PATCH v5 1/9] fadump: Add documentation for firmware-assisted dump.

Paul Mackerras paulus at samba.org
Thu Nov 24 17:34:10 EST 2011


On Tue, Nov 15, 2011 at 08:43:34PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> 
> Documentation for firmware-assisted dump. This document is based on the
> original documentation written for phyp assisted dump by Linas Vepstas
> and Manish Ahuja, with few changes to reflect the current implementation.
> 
> Change in v3:
> - Modified the documentation to reflect introdunction of fadump_registered
>   sysfs file and few minor changes.
> 
> Change in v2:
> - Modified the documentation to reflect the change of fadump_region
>   file under debugfs filesystem.

In general we don't want the changes between successive versions in
the patch description; this information should go below the "---"
line.  The patch description should describe how the patch is now and
give any information that will be useful to someone looking at the
resulting git commit later on, but it doesn't need to tell us about
previous versions of the patch that will never appear in the git
history.

> +-- Once the dump is copied out, the memory that held the dump
> +   is immediately available to the running kernel. A further
> +   reboot isn't required.

I have a general worry about the system making allocations that are
intended to be node-local while it is running with restricted memory
(i.e. after the crash and reboot and before the dump has been written
out and the dump memory freed).  Those allocations will probably all
come from one node and thus won't necessarily be on the desired node.
So, for very large systems with significant NUMA characteristics, it
may be desirable (though not required) to reboot after taking the
dump.

What happens about the NUMA information in the kernel -- all the
memory sections, etc.?  Do they get set up as normal even though the
second kernel is booting with only a small amount of memory initially?

> + /sys/kernel/debug/powerpc/fadump_region
> +
> +    This file shows the reserved memory regions if fadump is
> +    enabled otherwise this file is empty. The output format
> +    is:
> +    <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
> +
> +    e.g.
> +    Contents when fadump is registered during first kernel
> +
> +    # cat /sys/kernel/debug/powerpc/fadump_region
> +    CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
> +    HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
> +    DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0

How come the HPTE region is only 0x1000 (4k) bytes?  The hashed page
table (HPT) will be much bigger than this.  Is this our way of telling
the hypervisor that we don't care about the HPT?  If so, is it
possible to make this region 0 bytes instead of 0x1000?

Paul.



More information about the kexec mailing list