uniquely identifying KDUMP files that originate from QEMU
Dave Anderson
anderson at redhat.com
Wed Nov 12 07:45:08 PST 2014
----- Original Message -----
> On 11/12/14 15:09, Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> From: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
> >> To: ptesarik at suse.cz
> >> Cc: lersek at redhat.com, kexec at lists.infradead.org
> >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU
> >> Message-ID:
> >> <20141112.120838.303682123986142686.d.hatayama at jp.fujitsu.com>
> >> Content-Type: Text/Plain; charset=us-ascii
> >>
> >> From: Petr Tesarik <ptesarik at suse.cz>
> >> Subject: Re: uniquely identifying KDUMP files that originate from QEMU
> >> Date: Tue, 11 Nov 2014 13:09:13 +0100
> >>
> >>> On Tue, 11 Nov 2014 12:22:52 +0100
> >>> Laszlo Ersek <lersek at redhat.com> wrote:
> >>>
> >>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> >>>> keep me CC'd.)
> >>>>
> >>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> >>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
> >>>>
> >>>> The resultant vmcore is usually analyzed with the "crash" utility.
> >>>>
> >>>> The original tool producing such files is kdump. Unlike the procedure
> >>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> >>>> kdump kernel), and has more information about the original guest kernel
> >>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> >>>> is opaque.
> >>>>
> >>>> For this reason, the kdump preparation logic in QEMU hardcodes a number
> >>>> of fields in the kdump header. The direct issue is the "phys_base"
> >>>> field. Refer to dump.c, functions create_header32(), create_header64(),
> >>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> >>>> "0").
> >>>>
> >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
> >>>>
> >>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
> >>>>
> >>>> This works in most cases, because the guest Linux kernel indeed tends to
> >>>> be loaded at guest-phys address 0. However, when the guest Linux kernel
> >>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> >>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
> >>>> sync with the phys_base=0 setting visible in the KDUMP header.
> >>>>
> >>>> This trips up the "crash" utility.
> >>>>
> >>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> >>>> can identify QEMU as the originator of the vmcore by finding the QEMU
> >>>> notes in the ELF vmcore. If those are present, then "crash" employs a
> >>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> >>>>
> >>>> Alas, the QEMU notes are not present in the KDUMP-format vmcores that
> >>>> QEMU produces (they cannot be),
> >>>
> >>> Why? Since KDUMP format version 4, the complete ELF notes can be stored
> >>> in the file (see offset_note, size_note fields in the sub-header).
> >>>
> >>
> >> Yes, the QEMU notes is present in kdump-compressed format. But
> >> phys_base cannot be calculated only from qemu-side. We cannot do more
> >> than the efforts crash utility does for workaround. So, the phys_base
> >> value in kdump-sub header is now designed to have 0 now.
> >>
> >> Anyway, phys_base is kernel information. To make it available for qemu
> >> side, there's need to prepare a mechanism for qemu to have any access
> >> to it.
> >>
> >> One ad-hoc but simple way is to put phys_base value as part of
> >> VMCOREINFO note information on kernel.
> >>
> >> Although there has already been a similar one in VMCOREINFO, like
> >>
> >> arch/x86/kernel/
> >> ==
> >> void arch_crash_save_vmcoreinfo(void)
> >> {
> >> VMCOREINFO_SYMBOL(phys_base); <---- This
> >> VMCOREINFO_SYMBOL(init_level4_pgt);
> >>
> >> ...
> >> ==
> >>
> >> this is meangless, because this value is a virtual address assigned to
> >> phys_base symbol. To refer to the value of phys_base itself, we need
> >> the phys_base value we are about to get now.
> >>
> >> So, instead, if we change this to save the value, not value of symbol
> >> phys_base, we can get phys_base from the VMCOREINFO.
> >>
> >> The VMCOREINFO consists simply of string. So it's easy to search
> >> vmcore for it e.g. using strings and grep like this:
> >>
> >> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100
> >> VMCOREINFO
> >> OSRELEASE=3.10.0-121.el7.x86_64
> >> PAGESIZE=4096
> >> ...
> >> SYMBOL(phys_base)=ffffffff818e5010 <-- though this is address of
> >> phys_base
> >> now...
> >> SYMBOL(init_level4_pgt)=ffffffff818de000
> >> SYMBOL(node_data)=ffffffff819f1cc0
> >> LENGTH(node_data)=1024
> >> CRASHTIME=1399460394
> >> ...
> >>
> >> This should also be useful to get phys_base of 2nd kernel, which is
> >> inherently relocated kernel from a vmcore generated using qemu dump.
> >>
> >> This is far from well-designed from qemu's point of view, but it would
> >> be manually easier to get phys_base than now.
> >>
> >> Obviously, the VMCOREINFO is available only if CONFIG_KEXEC is
> >> enabled. Other users cannot use this.
> >>
> >> --
> >> Thanks.
> >> HATAYAMA, Daisuke
> >
> > I agree that the actual value of phys_base should be included in the
> > vmcoreinfo.
> >
> > However, it won't help in this case because the vmcoreinfo data is not
> > copied into the compressed dumpfile header. The offset_vmcoreinfo and
> > size_vmcoreinfo fields are zero.
> >
> > Here's an example header dump of a QEMU-generated dumpfile:
> >
> > crash> help -n
> > makedumpfile header:
> > signature: "makedumpfile"
> > type: 1
> > version: 1
> > all_flat_data:
> > num_array: 18695
> > array: 7f484b760010
> > file_size: 0
> >
> > diskdump_data:
> > filename: vmcore.ovmf.rhel7.kdump-snappy
> > flags: c6
> > (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED)
> > [FLAT]
> > dfd: 3
> > ofp: 3e441b1260
> > machine_type: 62 (EM_X86_64)
> >
> > header: 1a68fe0
> > signature: "KDUMP "
> > header_version: 6
> > utsname:
> > sysname:
> > nodename:
> > release:
> > version:
> > machine: x86_64
> > domainname:
> > timestamp:
> > tv_sec: 0
> > tv_usec: 0
> > status: 4 (DUMP_DH_COMPRESSED_SNAPPY)
> > block_size: 4096
> > sub_hdr_size: 1
> > bitmap_blocks: 76
> > max_mapnr: 1245184
> > total_ram_blocks: 0
> > device_blocks: 0
> > written_blocks: 0
> > current_cpu: 0
> > nr_cpus: 4
> > tasks[nr_cpus]: 0
> > 0
> > 0
> > 0
> >
> > sub_header: 0 (n/a)
> >
> > sub_header_kdump: 1a69ff0
> > phys_base: 0
> > dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
> > split: 0
> > start_pfn: (unused)
> > end_pfn: (unused)
> > offset_vmcoreinfo: 0 (0x0)
> > size_vmcoreinfo: 0 (0x0)
> > offset_note: 4200 (0x1068)
> > size_note: 3232 (0xca0)
> > num_prstatus_notes: 4
> > notes_buf: 1a6b000
> > notes[0]: 1a6b000
> > notes[1]: 1a6b164
> > notes[2]: 1a6b2c8
> > notes[3]: 1a6b42c
> > NT_PRSTATUS_offset: 1068
> > 11cc
> > 1330
> > 1494
> > offset_eraseinfo: 0 (0x0)
> > size_eraseinfo: 0 (0x0)
> > start_pfn_64: (unused)
> > end_pfn_64: (unused)
> > max_mapnr_64: 1245184 (0x130000)
> >
> > data_offset: 4e000
> > block_size: 4096
> > block_shift: 12
> > bitmap: 7f484b713010
> > bitmap_len: 311296
> > max_mapnr: 1245184 (0x130000)
> > dumpable_bitmap: 7f484b6c6010
> > byte: 0
> > bit: 0
> > compressed_page: 1a8c660
> > curbufptr: 1a7f650
> > ...
> >
> > Note that QEMU does add self-generated register dumps above, but the special
> > "QEMU" note that is added to ELF kdumps is not included.
> >
> > Also note that the kernel version information is also left zero-filled.
> >
> > In any case, if either a QEMU note or a diskdump.data flag were added, I would
> > be more than happy.
>
> Looks like a new flag needs to be negotiated with many stake-holders,
> but a QEMU note could be included even in the kdump format (not only the
> ELF format) freely, and tools that don't recognize it would simply
> ignore it. (And other tools that generate custom notes probably won't
> clash with it.)
>
> Is that correct? Because if it is, then (a) I didn't know it, (b) we
> only need an agreement between "crash" and qemu.
Agreed.
> Is the kdump format specified somewhere (as in, a PDF or text file)? I'd
> like to look into this option if possible.
I don't know. Bernhard Walle, formerly of SUSE, used to keep a text file
stored, but all the links are dead now:
[Crash-utility] Re: Kdump compressed format
https://www.redhat.com/archives/crash-utility/2008-August/msg00014.html
But his does certainly wouldn't have anything w/respect to QEMU notes.
Maybe Petr or the Fujitsu guys have a pointer?
But anyway, as it turns out, in QEMU ELF kdumps create one of the special
"QEMU" notes for each cpu:
$ readelf --notes vmcore
Notes at offset 0x000001c8 with length 0x00000ca0:
Owner Data size Description
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
CORE 0x00000150 NT_PRSTATUS (prstatus structure)
QEMU 0x000001b0 Unknown note type: (0x00000000)
QEMU 0x000001b0 Unknown note type: (0x00000000)
QEMU 0x000001b0 Unknown note type: (0x00000000)
QEMU 0x000001b0 Unknown note type: (0x00000000)
$
Here are the contents of each QEMU note:
crash> help -n
... [ cut ] ...
Elf64_Nhdr:
n_namesz: 5 ("QEMU")
n_descsz: 432
n_type: 0 (?)
000001b000000001 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000001
ffffffff81dd5228 ffffffff81a01ec8
ffffffff81a01ec8 0000000000000000
0000000000000000 00000013911d5f29
0000000000000000 ffffffff81c00480
0000000000000000 ffffffffffffffff
000000000309f000 ffffffff810375ab
0000000000000246 ffffffff00000010
0000000000a09b00 0000000000000000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000018
0000000000c09300 0000000000000000
ffffffff00000000 0000000000000000
0000000000000000 ffffffff00000000
0000000000000000 ffff880003200000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000000
0000000000000000 0000000000000000
0000208700000040 0000000000008b00
ffff880003213b40 0000007f00000000
0000000000000000 ffff880003204000
00000fff00000000 0000000000000000
ffffffff81dd2000 000000008005003b
0000000000000000 0000000001b2e000
0000000007b18000 00000000000006f0
Elf64_Nhdr:
n_namesz: 5 ("QEMU")
n_descsz: 432
n_type: 0 (?)
000001b000000001 ffffffff81a93760
000000000000000c 0000000080802001
0000000000000000 00000000000000ff
00000000000000f0 ffff880002287e88
ffff880002287e88 ffff880002287e50
ffff880002287e54 000009149661fc2b
ffff88001e6abe78 ffff880002287ef8
00000000fffffffe 0000000000000000
ffffffff81bfed40 ffffffff810375ba
0000000000000002 ffffffff00000010
0000000000a09b00 0000000000000000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000018
0000000000c09300 0000000000000000
ffffffff00000000 0000000000000000
0000000000000000 ffffffff00000000
0000000000000000 ffff880002280000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000000
0000000000000000 0000000000000000
0000208700000040 0000000000008b00
ffff880002293b40 0000007f00000000
0000000000000000 ffff880002284000
00000fff00000000 0000000000000000
ffffffff81dd2000 000000008005003b
0000000000000000 0000000002162570
000000001aab8000 00000000000006e0
Elf64_Nhdr:
n_namesz: 5 ("QEMU")
n_descsz: 432
n_type: 0 (?)
000001b000000001 ffffffff81a93760
000000000000000c 0000000080802001
0000000000000000 00000000000000ff
00000000000000f0 ffff880002307e88
ffff880002307e88 ffff880002307e50
ffff880002307e54 000009143aed494c
ffff88001e6dfe78 ffff880002307ef8
00000000fffffffe 0000000000000000
ffffffff81bfed40 ffffffff810375ba
0000000000000002 ffffffff00000010
0000000000a09b00 0000000000000000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000018
0000000000c09300 0000000000000000
ffffffff00000000 0000000000000000
0000000000000000 ffffffff00000000
0000000000000000 ffff880002300000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000000
0000000000000000 0000000000000000
0000208700000040 0000000000008b00
ffff880002313b40 0000007f00000000
0000000000000000 ffff880002304000
00000fff00000000 0000000000000000
ffffffff81dd2000 000000008005003b
0000000000000000 00007fd1a029c000
000000001d5c7000 00000000000006e0
Elf64_Nhdr:
n_namesz: 5 ("QEMU")
n_descsz: 432
n_type: 0 (?)
000001b000000001 ffffffff81a93760
000000000000000c 0000000080802001
0000000000000000 00000000000000ff
00000000000000f0 ffff880002387e88
ffff880002387e88 ffff880002387e50
ffff880002387e54 0000091497285969
0000000000000000 ffff880002387ef8
00000000fffffffe 0000000000000000
ffffffff81bfed40 ffffffff810375ba
0000000000000046 ffffffff00000010
0000000000a09b00 0000000000000000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000018
0000000000c09300 0000000000000000
ffffffff00000000 0000000000000000
0000000000000000 ffffffff00000000
0000000000000000 ffff880002380000
ffffffff00000018 0000000000c09300
0000000000000000 ffffffff00000000
0000000000000000 0000000000000000
0000208700000040 0000000000008b00
ffff880002393b40 0000007f00000000
0000000000000000 ffff880002384000
00000fff00000000 0000000000000000
ffffffff81dd2000 000000008005003b
0000000000000000 00007f981fe51000
000000001f214000 00000000000006e0
crash>
I'm not sure what the data consists of. The crash utility simply checks
for the existance of a note with a "QEMU" name string.
> Also, is there a command line tool that dumps metadata from a kdump
> file? (Quite like your "crash" invocation above, but I believe crash
> won't even start without a matching symbol file.)
I don't know of any, although Petr mentioned something about a "kdumpid" tool?
It's on sourceforge, but I've never heard of it until today, and don't know
if it dumps the full contents of headers.
However, you can get the header dump I showed before without a vmlinux file
by using the -d debug flag on the vmcore:
$ crash -d1 vmcore.ovmf.rhel7.kdump-zlib
crash 7.0.8
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
vmcore.ovmf.rhel7.kdump-zlib: FLAT
compressed kdump: header->utsname.machine: x86_64
makedumpfile header:
signature: "makedumpfile"
type: 1
version: 1
all_flat_data:
num_array: 13851
array: 7f3c0a0da010
file_size: 0
diskdump_data:
filename: vmcore.ovmf.rhel7.kdump-zlib
flags: 6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED) [FLAT]
dfd: 3
ofp: 0
machine_type: 62 (EM_X86_64)
header: 1c9bfe0
signature: "KDUMP "
header_version: 6
utsname:
sysname:
nodename:
release:
version:
machine: x86_64
domainname:
timestamp:
tv_sec: 0
tv_usec: 0
status: 1 (DUMP_DH_COMPRESSED_ZLIB)
block_size: 4096
sub_hdr_size: 1
bitmap_blocks: 76
max_mapnr: 1245184
total_ram_blocks: 0
device_blocks: 0
written_blocks: 0
current_cpu: 0
nr_cpus: 4
tasks[nr_cpus]: 0
0
0
0
sub_header: 0 (n/a)
sub_header_kdump: 1c9cff0
phys_base: 0
dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
split: 0
start_pfn: (unused)
end_pfn: (unused)
offset_vmcoreinfo: 0 (0x0)
size_vmcoreinfo: 0 (0x0)
offset_note: 4200 (0x1068)
size_note: 3232 (0xca0)
num_prstatus_notes: 4
notes_buf: 1c9e000
notes[0]: 1c9e000
notes[1]: 1c9e164
notes[2]: 1c9e2c8
notes[3]: 1c9e42c
NT_PRSTATUS_offset: 1068
11cc
1330
1494
offset_eraseinfo: 0 (0x0)
size_eraseinfo: 0 (0x0)
start_pfn_64: (unused)
end_pfn_64: (unused)
max_mapnr_64: 1245184 (0x130000)
data_offset: 4e000
block_size: 4096
block_shift: 12
bitmap: 7f3c0a08d010
bitmap_len: 311296
max_mapnr: 1245184 (0x130000)
dumpable_bitmap: 7f3c0a040010
byte: 0
bit: 0
compressed_page: 1cbf660
curbufptr: 0
page_cache_hdr[0]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1caf650
pg_hit_count: 0
page_cache_hdr[1]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb0650
pg_hit_count: 0
page_cache_hdr[2]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb1650
pg_hit_count: 0
page_cache_hdr[3]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb2650
pg_hit_count: 0
page_cache_hdr[4]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb3650
pg_hit_count: 0
page_cache_hdr[5]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb4650
pg_hit_count: 0
page_cache_hdr[6]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb5650
pg_hit_count: 0
page_cache_hdr[7]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb6650
pg_hit_count: 0
page_cache_hdr[8]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb7650
pg_hit_count: 0
page_cache_hdr[9]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb8650
pg_hit_count: 0
page_cache_hdr[10]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cb9650
pg_hit_count: 0
page_cache_hdr[11]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cba650
pg_hit_count: 0
page_cache_hdr[12]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cbb650
pg_hit_count: 0
page_cache_hdr[13]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cbc650
pg_hit_count: 0
page_cache_hdr[14]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cbd650
pg_hit_count: 0
page_cache_hdr[15]:
pg_flags: 0 ()
pg_addr: 0
pg_bufptr: 1cbe650
pg_hit_count: 0
page_cache_buf: 1caf650
evict_index: 0
evictions: 0
accesses: 0
cached_reads: 0
valid_pages: 1caecc0
crash: namelist argument required
Usage:
crash [OPTION]... NAMELIST MEMORY-IMAGE[@ADDRESS] (dumpfile form)
crash [OPTION]... [NAMELIST] (live system form)
Enter "crash -h" for details.
$
Dave
More information about the kexec
mailing list