kexec: purgatory hang

Yinghai Lu yinghai at kernel.org
Tue Jun 11 21:45:25 EDT 2013


On Tue, Jun 11, 2013 at 3:54 PM, Cliff Wickman <cpw at sgi.com> wrote:
>
> I'm getting a hang when trying to enter a high-memory crash kernel,
> and I'm at a loss as to how to debug this.
>
> This is a 3.10.0-rc3 kernel, and set up as the crash kernel by kexec 2.0.4.
> The machine is an SGI UV1000.

what is your mem size?

Just tried on one 3T system, it works well...

in first kernel:
sca05-0a81fd78:~ # cat /proc/iomem
00000000-00000fff : reserved
00001000-0009afff : System RAM
0009b000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000ce7ff : Adapter ROM
000ce800-000cf7ff : Adapter ROM
000cf800-000d07ff : Adapter ROM
000e0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-68ad0fff : System RAM
  01000000-020b7d40 : Kernel code
  020b7d41-02bd47ff : Kernel data
  02f80000-03c20fff : Kernel bss
68ad1000-69265fff : reserved
69266000-69355fff : ACPI Tables
69356000-6a0e4fff : ACPI Non-volatile Storage
6a0e5000-6bd68fff : reserved
6bd69000-6bd98fff : System RAM
6bd99000-6bd99fff : reserved
6bd9a000-7bffffff : System RAM
  74000000-7bffffff : Crash kernel
...
100000000-3007fffffff : System RAM
  30040000000-3007fffffff : Crash kernel

boot command line:
console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/x.xz rw
root=/dev/ram0 debug ignore_loglevel unknown_nmi_panic
crashkernel=1024M,high crashkernel=128M,low pci=routeirq ip=dhcp
load_ramdisk=1 BOOT_IMAGE=kernel.org/bzImage_3.10_k8.2

kexec second kernel:

# ./kexec -p $VMLINUZ --command-line="initcall_debug nr_cpus=1
pci=routeirq ignore
_loglevel unknown_nmi_panic apic=debug ramdisk_size=$RDSZ root=/dev/ram0 rw ip=d
hcp $CONSOLE" --ramdisk=$INITRD


add_buffer: base:3007ff65000 bufsz:9a000 memsz:9a000
add_buffer: base:3007ff60000 bufsz:3800 memsz:4000
add_buffer: base:3007ff55000 bufsz:80e0 memsz:a000
add_buffer: base:3007ff4f000 bufsz:437a memsz:437a
add_buffer: base:3007d000000 bufsz:8fd240 memsz:2c1f000
add_buffer: base:30079562000 bufsz:3a9ca12 memsz:3a9ca12

...
# echo c > /proc/sysrq-trigger
[  707.078371] SysRq : Trigger a crash
[  707.082358] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[  707.091232] IP: [<ffffffff815e4b06>] sysrq_handle_crash+0x16/0x20
[  707.098170] PGD 0
[  707.100533] Oops: 0002 [#1] SMP
[  707.104262] Modules linked in:
[  707.107753] CPU: 11 PID: 20796 Comm: bash Tainted: G          I
3.10.0-rc5-yh-00891-g188560d-dirty #1736
[  707.128620] task: ffff89de66e1a5a0 ti: ffff89de68bec000 task.ti:
ffff89de68bec000
[  707.137014] RIP: 0010:[<ffffffff815e4b06>]  [<ffffffff815e4b06>]
sysrq_handle_crash+0x16/0x20
[  707.146651] RSP: 0018:ffff89de68bede48  EFLAGS: 00010096
[  707.152634] RAX: 000000000000000f RBX: ffffffff82af27e0 RCX: ffff885efd9cf130
[  707.160656] RDX: 0000000000000001 RSI: ffffffff8108edb0 RDI: 0000000000000063
[  707.168687] RBP: ffff89de68bede48 R08: 0000000000000001 R09: 0000000000000001
[  707.176716] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000063
[  707.184745] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000001
[  707.192774] FS:  00007f89bd578700(0000) GS:ffff885efd800000(0000)
knlGS:0000000000000000
[  707.201863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  707.208342] CR2: 0000000000000000 CR3: 0000023e66deb000 CR4: 00000000001407e0
[  707.216364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  707.224390] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  707.232418] Stack:
[  707.234722]  ffff89de68bede88 ffffffff815e52a2 ffff89de68bede88
0000000000000002
[  707.243252]  0000000000000002 00007f89bd57d000 ffff89de68bedf50
0000000000000000
[  707.251751]  ffff89de68bedeb8 ffffffff815e53d0 00007f89bd86e290
00007f89bd57d000
[  707.260235] Call Trace:
[  707.262996]  [<ffffffff815e52a2>] __handle_sysrq+0xc2/0x1b0
[  707.269278]  [<ffffffff815e53d0>] write_sysrq_trigger+0x40/0x50
[  707.275948]  [<ffffffff81220f42>] proc_reg_write+0x42/0x80
[  707.282133]  [<ffffffff811c03eb>] vfs_write+0xeb/0x1c0
[  707.287911]  [<ffffffff811c0865>] SyS_write+0x55/0xb0
[  707.293610]  [<ffffffff820b23da>] tracesys+0xd4/0xd9
[  707.299166] Code: f0 4c 8b 65 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00
00 0f 1f 40 00 0f 1f 44 00 00 55 c7 05 cc ff a1 01 01 00 00 00 48 89
e5 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 48 89 e5
53 48
[  707.321648] RIP  [<ffffffff815e4b06>] sysrq_handle_crash+0x16/0x20
[  707.328623]  RSP <ffff89de68bede48>
[  707.332573] CR2: 0000000000000000
early console in decompress_kernel
decompress_kernel:
  input: [0x3007ea682c2-0x3007f35d8f5], output: 0x3007d000000, heap:
[0x3007f365240-0x3007f36d23f]

Decompressing Linux... xz... Parsing ELF... done.
Booting the kernel.
[    0.000000] bootconsole [uart0] enabled
[    0.000000]    real_mode_data :      phys 000003007ff4f000
[    0.000000]    real_mode_data :      virt ffff8b007ff4f000
[    0.000000]       boot_params : init virt ffffffff82f509e0
[    0.000000]       boot_params :      phys 000003007ef509e0
[    0.000000]       boot_params :      virt ffff8b007ef509e0
[    0.000000] boot_command_line : init virt ffffffff82e24020
[    0.000000] boot_command_line :      phys 000003007ee24020
[    0.000000] boot_command_line :      virt ffff8b007ee24020
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x3007d000000-0x3007e0bfde0]
[    0.000000] .rodata: [0x3007e200000-0x3007e9c1fff]
[    0.000000]   .data: [0x3007ea00000-0x3007ebb9abf]
[    0.000000]   .init: [0x3007ebbb000-0x3007ef3bfff]
[    0.000000]    .bss: [0x3007ef4a000-0x3007fbf9fff]
[    0.000000]    .brk: [0x3007fbfa000-0x3007fc1efff]
[    0.000000] memblock_reserve: [0x0009ac00-0x000fffff] * BIOS reserved
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.9.0-yh-02267-g2413a4c-dirty
(yhlu at linux-siqj.site) (gcc version 4.7.2 20130108 [gcc-4_7-branch
revision 195012] (SUSE Linux) ) #1507 SMP Mon Apr 29 10:52:45 PDT 2013
[    0.000000] memblock_reserve: [0x3007d000000-0x3007fbf9fff] TEXT DATA BSS
[    0.000000] memblock_reserve: [0x30079562000-0x3007cffefff] RAMDISK
[    0.000000] Command line: initcall_debug nr_cpus=1 pci=routeirq
ignore_loglevel unknown_nmi_panic apic=debug ramdisk_size=262144
root=/dev/ram0 rw ip=dhcp console=uart8250,io,0x3f8,115200n8
memmap=exactmap memmap=616K at 4K memmap=131072K at 1900544K
memmap=1047936K at 3222274048K elfcorehdr=3223321984K
memmap=960K#1722776K memmap=13884K#1723736K
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] Physical RAM map:
[    0.000000] raw: [mem 0x0000000000000100-0x000000000009afff] usable
[    0.000000] raw: [mem 0x000000000009b000-0x000000000009ffff] reserved
[    0.000000] raw: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] raw: [mem 0x0000000000100000-0x0000000068ad0fff] usable
[    0.000000] raw: [mem 0x0000000068ad1000-0x0000000069265fff] reserved
[    0.000000] raw: [mem 0x0000000069266000-0x0000000069355fff] ACPI data
[    0.000000] raw: [mem 0x0000000069356000-0x000000006a0e4fff] ACPI NVS
[    0.000000] raw: [mem 0x000000006a0e5000-0x000000006bd68fff] reserved
[    0.000000] raw: [mem 0x000000006bd69000-0x000000006bd98fff] usable
[    0.000000] raw: [mem 0x000000006bd99000-0x000000006bd99fff] reserved
[    0.000000] raw: [mem 0x000000006bd9a000-0x000000007bffffff] usable
[    0.000000] raw: [mem 0x0000000080000000-0x000000008fffffff] reserved
[    0.000000] raw: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] raw: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] raw: [mem 0x0000000100000000-0x000003007fffffff] usable
[    0.000000] e820: BIOS-provided physical RAM map (sanitized by setup):
[    0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009afff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009b000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000068ad0fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000068ad1000-0x0000000069265fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000069266000-0x0000000069355fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000069356000-0x000000006a0e4fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000006a0e5000-0x000000006bd68fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000006bd69000-0x000000006bd98fff] usable
[    0.000000] BIOS-e820: [mem 0x000000006bd99000-0x000000006bd99fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000006bd9a000-0x000000007bffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000003007fffffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] e820: last_pfn = 0x30080000 max_arch_pfn = 0x400000000
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] e820: user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000001000-0x000000000009afff] usable
[    0.000000] user: [mem 0x0000000069266000-0x000000006a0e4fff] ACPI data
[    0.000000] user: [mem 0x0000000074000000-0x000000007bffffff] usable
[    0.000000] user: [mem 0x0000030040000000-0x000003007ff5ffff] usable
...



More information about the kexec mailing list