Memory needed for a kdump kernel has been bloated

Neil Horman nhorman at redhat.com
Thu Aug 21 19:31:27 EDT 2008


On Thu, Aug 21, 2008 at 01:35:03PM -0700, Jay Lan wrote:
> I have an IA64 system with 250G memory. I reserved 1024M memory for the
> kdump kernel. It worked fine... up to 2.6.23.
> 
> Starting 2.6.24-rc1, booting a kdump kernel on the machine has been
> failed on OOM. I tried 1280M, but still failed. I threw in 2048M and
> then it worked. When OOM happened, it failed on allocating memory
> for adding a disk.
> 
> I saw two problems here:
> 1) the memory needed has been bloated since 2.6.23. and
> 2) the kdump kernel tried to add disk /dev/sdb when it is not even
>    in /etc/fstab. I think only the system disk and the disk where
>    we want to save the vmcore to should be needed.
> 
> Sorry that i am still chasing a few other problems in recent kernels
> and thus i do not provide a patch.
> 
> Below is part of the console messages on OOM.
> 
> - jay
> 
> 
This is likely an issue with the underlying scsi driver, and you should likely
contact the appropriate maintainer.  there should have been a buddy allocator
output with this that gives you some idea of how much memory is remaining, and
you should be able to (if you built all the scsi drivers as modules), load the
modules in the backround and spin tightly watching /proc/slabinfo to get some
idea of where the memory is being consumed. 

Then you and the maintainer can start looking at ways to reduce useage of the
offending slab
Neil

> ...
> Loading mptscsih
> Loading mptsas
> Fusion MPT SAS Host driver 3.04.06
> ACPI: PCI Interrupt 0001:00:01.0[A] -> GSI 60 (level, low) -> IRQ 60
> mptbase: ioc0: Initiating bringup
> ioc0: LSISAS1068 B0: Capabilities={Initiator}
> scsi0 : ioc0: LSISAS1068 B0, FwRev=01100000h, Ports=1, MaxQ=511, IRQ=60
> scsi 0:0:0:0: Direct-Access     SGI      ST3146854SS      X421 PQ: 0 ANSI: 3
> sd 0:0:0:0: [sda] 286749488 512-byte hardware sectors (146816 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports
> DPO and FUA
> sd 0:0:0:0: [sda] 286749488 512-byte hardware sectors (146816 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports
> DPO and FUA
>  sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11
> sd 0:0:0:0: [sda] Attached SCSI disk
> ACPI: PCI Interrupt 0011:01:00.0[A] -> GSI 66 (level, low) -> IRQ 66
> mptbase: ioc1: Initiating bringup
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> mptbase: ioc1: ERROR - Diagnostic reset FAILED! (142h)
> mptbase: ioc1: WARNING - NOT READY!
> mptbase: ioc1: ERROR - didn't initialize properly! (-1)
> mptsas: probe of 0011:01:00.0 failed with error -1
> ACPI: PCI Interrupt 0031:00:01.0[A] -> GSI 70 (level, low) -> IRQ 70
> mptbase: ioc2: Initiating bringup
> ioc2: LSISAS1064 A3: Capabilities={Initiator}
> scsi1 : ioc2: LSISAS1064 A3, FwRev=01070000h, Ports=1, MaxQ=511, IRQ=70
> scsi 1:0:0:0: Direct-Access     SGI      ST3146854SS      X421 PQ: 0 ANSI: 3
> sd 1:0:0:0: [sdb] 286749488 512-byte hardware sectors (146816 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports
> DPO and FUA
> sd 1:0:0:0: [sdb] 286749488 512-byte hardware sectors (146816 MB)
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports
> DPO and FUA
>  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11
> modprobe invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
> 
> Call Trace:
>  [<a000000100014e40>] show_stack+0x40/0xa0
>                                 sp=e00000607108f670 bsp=e000006071081a70
>  [<a000000100014ed0>] dump_stack+0x30/0x60
>                                 sp=e00000607108f840 bsp=e000006071081a58
>  [<a00000010010e980>] oom_kill_process+0x80/0x3a0
>                                 sp=e00000607108f840 bsp=e000006071081a00
>  [<a00000010010f660>] out_of_memory+0x520/0x6a0
>                                 sp=e00000607108f850 bsp=e0000060710819b0
>  [<a000000100116c80>] __alloc_pages_internal+0x580/0x700
>                                 sp=e00000607108f8e0 bsp=e000006071081928
>  [<a000000100116e90>] __alloc_pages+0x30/0x60
>                                 sp=e00000607108f8f0 bsp=e0000060710818f8
>  [<a0000001001658a0>] new_slab+0x2a0/0x6c0
>                                 sp=e00000607108f8f0 bsp=e0000060710818a8
>  [<a0000001001661c0>] __slab_alloc+0x500/0xae0
>                                 sp=e00000607108f8f0 bsp=e000006071081848
>  [<a000000100169ca0>] __kmalloc_node+0x120/0x200
>                                 sp=e00000607108f900 bsp=e000006071081808
>  [<a00000010016fd20>] percpu_populate+0x100/0x160
>                                 sp=e00000607108f900 bsp=e0000060710817c0
>  [<a00000010016fdf0>] __percpu_populate_mask+0x70/0x160
>                                 sp=e00000607108f900 bsp=e000006071081780
>  [<a00000010016ff80>] __percpu_alloc_mask+0xa0/0xe0
>                                 sp=e00000607108f940 bsp=e000006071081748
>  [<a000000100212f10>] add_partition+0x50/0x380
>                                 sp=e00000607108f940 bsp=e0000060710816f0
>  [<a000000100213e20>] rescan_partitions+0x4a0/0x520
>                                 sp=e00000607108f940 bsp=e000006071081690
>  [<a0000001001d5220>] do_open+0x520/0x6e0
>                                 sp=e00000607108f940 bsp=e000006071081628
>  [<a0000001001d5490>] __blkdev_get+0xb0/0xe0
>                                 sp=e00000607108f950 bsp=e0000060710815e0
>  [<a0000001001d54f0>] blkdev_get+0x30/0x60
>                                 sp=e00000607108fad0 bsp=e0000060710815b0
>  [<a000000100213850>] register_disk+0x230/0x360
>                                 sp=e00000607108fad0 bsp=e000006071081578
>  [<a0000001004146a0>] add_disk+0xa0/0x140
>                                 sp=e00000607108fad0 bsp=e000006071081550
>  [<a0000001005fbce0>] sd_probe+0x740/0x8a0
>                                 sp=e00000607108fad0 bsp=e0000060710814f8
>  [<a00000010051a5c0>] driver_probe_device+0x220/0x360
>                                 sp=e00000607108fae0 bsp=e0000060710814c0
>  [<a00000010051a810>] __device_attach+0x30/0x60
>                                 sp=e00000607108fae0 bsp=e000006071081498
>  [<a000000100518a00>] bus_for_each_drv+0xa0/0x140
>                                 sp=e00000607108fae0 bsp=e000006071081460
>  [<a00000010051a940>] device_attach+0xa0/0xe0
>                                 sp=e00000607108fb00 bsp=e000006071081430
>  [<a0000001005185d0>] bus_attach_device+0x70/0x100
>                                 sp=e00000607108fb00 bsp=e000006071081400
>  [<a000000100515990>] device_add+0x810/0xb40
>                                 sp=e00000607108fb00 bsp=e000006071081390
>  [<a0000001005cd5e0>] scsi_sysfs_add_sdev+0x160/0x480
>                                 sp=e00000607108fb00 bsp=e000006071081350
>  [<a0000001005c8d00>] scsi_probe_and_add_lun+0x10a0/0x1340
>                                 sp=e00000607108fb00 bsp=e0000060710812e0
>  [<a0000001005c9590>] __scsi_scan_target+0x150/0xb00
>                                 sp=e00000607108fb30 bsp=e000006071081290
>  [<a0000001005caa00>] scsi_scan_target+0x120/0x160
>                                 sp=e00000607108fb90 bsp=e000006071081240
>  [<a0000002024926a0>] sas_rphy_add+0x300/0x340 [scsi_transport_sas]
>                                 sp=e00000607108fb90 bsp=e000006071081200
>  [<a000000202602240>] mptsas_probe_one_phy+0x900/0x9c0 [mptsas]
>                                 sp=e00000607108fb90 bsp=e0000060710811a8
>  [<a000000202603be0>] mptsas_probe_hba_phys+0xe20/0xf00 [mptsas]
>                                 sp=e00000607108fbb0 bsp=e000006071081150
>  [<a000000202607080>] mptsas_probe+0x7c0/0x960 [mptsas]
>                                 sp=e00000607108fcf0 bsp=e0000060710810e8
>  [<a00000010044e730>] pci_device_probe+0x170/0x240
>                                 sp=e00000607108fd00 bsp=e000006071081090
>  [<a00000010051a5c0>] driver_probe_device+0x220/0x360
>                                 sp=e00000607108fd80 bsp=e000006071081058
>  [<a00000010051a780>] __driver_attach+0x80/0xe0
>                                 sp=e00000607108fd80 bsp=e000006071081020
>  [<a000000100519070>] bus_for_each_dev+0x90/0x100
>                                 sp=e00000607108fd80 bsp=e000006071080fe0
>  [<a00000010051a160>] driver_attach+0x40/0x60
>                                 sp=e00000607108fda0 bsp=e000006071080fc0
>  [<a000000100519b00>] bus_add_driver+0x160/0x4a0
>                                 sp=e00000607108fda0 bsp=e000006071080f78
>  [<a00000010051ae30>] driver_register+0x1b0/0x300
>                                 sp=e00000607108fda0 bsp=e000006071080f30
>  [<a00000010044ecd0>] __pci_register_driver+0xb0/0x140
>                                 sp=e00000607108fda0 bsp=e000006071080ef8
>  [<a0000002026301e0>] mptsas_init+0x1e0/0x320 [mptsas]
>                                 sp=e00000607108fdb0 bsp=e000006071080ec8
>  [<a0000001000e6990>] sys_init_module+0x3610/0x3940
>                                 sp=e00000607108fdb0 bsp=e000006071080d48
>  [<a00000010000af80>] ia64_ret_from_syscall+0x0/0x20
>                                 sp=e00000607108fe30 bsp=e000006071080d48
>  [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
>                                 sp=e000006071090000 bsp=e000006071080d48
> 
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

-- 
/***************************************************
 *Neil Horman
 *Senior Software Engineer
 *Red Hat, Inc.
 *nhorman at redhat.com
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***************************************************/



More information about the kexec mailing list