kdump failed because of hotplug memory adding in kdump kernel

Toshi Kani toshi.kani at hp.com
Thu Jan 9 11:15:18 EST 2014


On Thu, 2014-01-09 at 09:53 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 02:10:26PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, January 08, 2014 05:11:48 PM Toshi Kani wrote:
> > > On Thu, 2014-01-09 at 00:07 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, January 08, 2014 10:58:29 AM Vivek Goyal wrote:
> > > > > On Wed, Jan 08, 2014 at 11:26:43PM +0800, Baoquan wrote:
> > > > > 
> > > > > [..]
> > > > > > [    1.592222] acpi PNP0A03:03: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
> > > > > > [    1.605045] PCI host bridge to bus 0000:ff
> > > > > > [    1.609615] pci_bus 0000:ff: root bus resource [bus ff]
> > > > > > [    1.632117] System RAM resource [mem 0x01000000-0x7bffffff] cannot be added
> > > > > > [    1.639892] init_memory_mapping: [mem 0x100000000-0x87fffffff]
> > > > > > [    1.717793] swapper/0: page allocation failure: order:9, mode:0x84d0
> > > > > > [    1.724884] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-59.el7.x86_64 #1
> > > > > > [    1.732842] Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S001.032520101647 03/25/2010
> > > > > > [    1.743224]  0000000000000000 ffff8800339878c8 ffffffff815b64ad ffff880033987950
> > > > > > [    1.751513]  ffffffff8113a980 ffff88003673ab28 00000000000001fe 0000000000000001
> > > > > > [    1.759804]  ffff880000000040 ffffffff810bc28a 0000000000000000 0000000000000200
> > > > > > [    1.768096] Call Trace:                                                                                                                                            [348/1928]
> > > > > > [    1.770834]  [<ffffffff815b64ad>] dump_stack+0x19/0x1b
> > > > > > [    1.776561]  [<ffffffff8113a980>] warn_alloc_failed+0xf0/0x160
> > > > > > [    1.783076]  [<ffffffff810bc28a>] ? on_each_cpu_mask+0x2a/0x60
> > > > > > [    1.789581]  [<ffffffff8113e92f>] __alloc_pages_nodemask+0x7ff/0xa00
> > > > > > [    1.796672]  [<ffffffff815ada2c>] vmemmap_alloc_block+0x62/0xba
> > > > > > [    1.803274]  [<ffffffff815ada99>] vmemmap_alloc_block_buf+0x15/0x3b
> > > > > > [    1.810263]  [<ffffffff815ab8a6>] vmemmap_populate+0xb4/0x21b
> > > > > > [    1.816673]  [<ffffffff815adecd>] sparse_mem_map_populate+0x27/0x35
> > > > > > [    1.823665]  [<ffffffff815ad8bf>] sparse_add_one_section+0x7a/0x185
> > > > > > [    1.830659]  [<ffffffff8159b74f>] __add_pages+0xaf/0x240
> > > > > > [    1.836588]  [<ffffffff81047359>] arch_add_memory+0x59/0xd0
> > > > > > [    1.842804]  [<ffffffff8159ba89>] add_memory+0xb9/0x1b0
> > > > > > [    1.848638]  [<ffffffff8132dd2c>] acpi_memory_device_add+0x18d/0x26d
> > > > > > [    1.855728]  [<ffffffff81303b91>] acpi_bus_device_attach+0x7d/0xcd
> > > > > > [    1.862625]  [<ffffffff8131d92d>] acpi_ns_walk_namespace+0xc8/0x17f
> > > > > > [    1.869616]  [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > > [    1.876896]  [<ffffffff81303b14>] ? acpi_bus_type_and_status+0x90/0x90
> > > > > > [    1.884177]  [<ffffffff8131de1c>] acpi_walk_namespace+0x95/0xc5
> > > > > > [    1.890780]  [<ffffffff81304866>] acpi_bus_scan+0x8b/0x9d
> > > > > > [    1.896805]  [<ffffffff81a14a15>] acpi_scan_init+0x63/0x160
> > > > > > [    1.903021]  [<ffffffff81a14830>] acpi_init+0x25d/0x2a6
> > > > > 
> > > > > So basically acpi thinks that some memory block is a hot plug memory
> > > > > and tries to add it. And that consumes lots of memory and we don't have
> > > > > that memory in second kernel.
> > > > 
> > > > That's not exactly the case.  What seems to happen is that there is an ACPI
> > > > memory object in the ACPI namespace and the ACPI memory hotplug driver
> > > > attempts to bind to it.  That driver attempts to find removable memory blocks
> > > > associated with that object and to add them to the memory map.
> > > > 
> > > > Why don't you simply append acpi=off to the kexec command line?  That should
> > > > make the problem go away.
> > > 
> > > Yes, that should work, but Baoquan's approach makes sense to me.  When
> > > memmap=exactmap is specified, the kernel should ignore any memory
> > > information from the firmware.
> > 
> > OK
> > 
> > Baoquan, please modify your patch to get rid of the #ifdef CONFIG_X86 in
> > acpi_memory_hotplug_init().  For example, you can add a function returning true
> > if use_exactmap is set and false otherwise and make acpi_memory_hotplug_init()
> > call that function.  Alternatively, you can define arch-independent
> > no_memory_hotplug (instead of use_exactmap) and set if for memmap=exactmap.
> > 
> 
> Prarit sent a patch to introduce no_memory_hotplug command line. I still
> think that memmap=exactmap does not necessarily mean that memory hotplug
> is disabled.
> 
> What about mem= parameter. If somebody specifies mem=1G, should that mean
> there can not be any hotplugged memory.

Good point.  Yes, I think we need to ignore ACPI memory objects in this
case as well.  I suppose the use of this option is limited for specific
test purpose, and disabling memory hotplug is not a big issue here.

Thanks,
-Toshi




More information about the kexec mailing list