[Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

Wed Nov 12 06:36:38 PST 2014

V Wed, 12 Nov 2014 08:28:54 -0500
Christopher Covington <cov at codeaurora.org> napsáno:

> On 11/12/2014 08:26 AM, Petr Tesarik wrote:
> > On Wed, 12 Nov 2014 08:18:04 -0500
> > Christopher Covington <cov at codeaurora.org> wrote:
> > 
> >> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
> >>> On Tue, 11 Nov 2014 12:27:44 -0500
> >>> Christopher Covington <cov at codeaurora.org> wrote:
> >>>
> >>>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> >>>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> >>>>> keep me CC'd.)
> >>>>>
> >>>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> >>>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
> >>>>>
> >>>>> The resultant vmcore is usually analyzed with the "crash" utility.
> >>>>>
> >>>>> The original tool producing such files is kdump. Unlike the procedure
> >>>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> >>>>> kdump kernel), and has more information about the original guest kernel
> >>>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> >>>>> is opaque.
> >>>>>
> >>>>> For this reason, the kdump preparation logic in QEMU hardcodes a number
> >>>>> of fields in the kdump header. The direct issue is the "phys_base"
> >>>>> field. Refer to dump.c, functions create_header32(), create_header64(),
> >>>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> >>>>> "0").
> >>>>>
> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
> >>>>>
> >>>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
> >>>>>
> >>>>> This works in most cases, because the guest Linux kernel indeed tends to
> >>>>> be loaded at guest-phys address 0. However, when the guest Linux kernel
> >>>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> >>>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
> >>>>> sync with the phys_base=0 setting visible in the KDUMP header.
> >>>>>
> >>>>> This trips up the "crash" utility.
> >>>>>
> >>>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> >>>>> can identify QEMU as the originator of the vmcore by finding the QEMU
> >>>>> notes in the ELF vmcore. If those are present, then "crash" employs a
> >>>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> >>>>
> >>>> What advantages does KDUMP have over ELF?
> >>>
> >>> It's smaller (data is compressed), and it contains a header with some
> >>> useful information (e.g. the crashed kernel's version and release).
> >>
> >> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?
> > 
> > Not the same thing. With KDUMP, each page is compressed separately, so
> > if a utility like crash needs a page from the middle, it can find it
> > and unpack it immediately. If we had an ELF.xz, then the whole file
> > must be unpacked before it can be used. And unpacking a few terabytes
> > takes ... a while. ;-)
> 
> Understood on the ELF.xz approach, but why couldn't each page (or maybe a
> configurable size) be a SHF_COMPRESSED section?

A machine with 64TB of RAM (already manufactured by SGI) has
17,179,869,184 pages. When KDUMP (or, actually diskdump) format was
invented, ELF files could have at most 2^16 = 65,536 program headers.
Since then, ELF specification has been extended (PN_XNUM), so the
number of sections can be stored in the sh_info field of the first ELF
section, but that only increases the number of possible sections to
2^32 = 4,294,967,296.

Yes, we could divide memory into larger chunks than pages, but:

  1. you're probably the first one to have the idea, and
  2. this is easy if you save the complete RAM content, but not quite
     that easy if some pages should be filtered out (makedumpfile).

There are a few other (minor) points, e.g.:

  * Each program header consumes 56 bytes in ELF64, while a single
    bit is sufficient in KDUMP compressed files to tell if the
    corresponding page is stored or not.

  * SHF_COMPRESSED currently supports only zlib compression, which is
    rather slow. KDUMP supports zlib, lzo and snappy.

  * Support for KDUMP files is already present in the crash utility,
    while I don't think there is any support for SHF_COMPRESSED
    segments.

In short, SHF_COMPRESSED looks like a viable alternative, but right now
KDUMP is the better choice in terms of features and interoperability.

Just my two cents,
Petr T