uniquely identifying KDUMP files that originate from QEMU

Laszlo Ersek lersek at redhat.com
Wed Nov 12 12:30:20 PST 2014


adding back a few CC's because this discussion is useful

On 11/12/14 19:43, Petr Tesarik wrote:
> V Wed, 12 Nov 2014 15:50:32 +0100
> Laszlo Ersek <lersek at redhat.com> napsáno:
> 
>> On 11/12/14 09:04, Petr Tesarik wrote:
>>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST)
>>> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
>>
>>>> Anyway, phys_base is kernel information. To make it available for qemu
>>>> side, there's need to prepare a mechanism for qemu to have any access
>>>> to it.
>>>
>>> Yes. I wonder if you can have access without some sort of co-operation
>>> from the guest kernel itself. I guess not.
>>
>> Propagating any kind of additional information from the guest kernel
>> (which is unprivileged and potentially malicious) to the host-side qemu
>> process (which is by definition more privileged, although still confined
>> by various measures) is something we'd explicitly like to avoid.
>>
>> Think of it like this. I throw a physical box at you, running Linux,
>> that has frozen in time. Can "crash" work with nothing else but the
>> contents of the memory, and information about the CPUs?
> 
> If only you could save the _complete_ state of the CPU... For example
> the content of CR3 would be quite useful.

(1) CR3 is already saved, in both the ELF and the kdump compressed formats.

- ELF case:

qmp_dump_guest_memory() [dump.c]
  create_vmcore()
    dump_begin()
      write_elf64_notes()

        loop from 1 to #vcpu:
          cpu_write_elf64_note() [qom/cpu.c]
            x86_64_write_elf64_note() [target-i386/arch_dump.c]
              writes "CORE"

        loop from 1 to #vcpu:
          cpu_write_elf64_qemunote() [qom/cpu.c]
            x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c]
              cpu_write_qemu_note()
                qemu_get_cpustate()
                  s->cr[3] = env->cr[3]; <---------- here
                writes "QEMU"

Hence, the information is part of the QEMU note.

- kdump case:

qmp_dump_guest_memory() [dump.c]
  create_kdump_vmcore()
    write_dump_header()
      create_header64()
        write_elf64_notes()
          [... same as above ...]

The trick here is that the note-writer functions use a callback function
for actually outputting the data. So while in the ELF case the stuff
goes directly to a file, in the kdump case the notes are first saved in
a memory buffer, and then later saved in the file at offset
KdumpSubHeader64.offset_note. (... Which is then represented in the
flattened file format of course.)

So, the information is there in both cases.


(2) Dave -- this just made me realize that the QEMU note is *already*
there in the kdump file as well; pointed-to by
KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size.

>From your other email
<http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>:

>     sub_header_kdump: 1c9cff0
>              phys_base: 0
>             dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
>                  split: 0
>              start_pfn: (unused)
>                end_pfn: (unused)
>      offset_vmcoreinfo: 0 (0x0)
>        size_vmcoreinfo: 0 (0x0)
>            offset_note: 4200 (0x1068)       <----------- here
>              size_note: 3232 (0xca0)        <-----------
>     num_prstatus_notes: 4
>              notes_buf: 1c9e000
>               notes[0]: 1c9e000
>               notes[1]: 1c9e164
>               notes[2]: 1c9e2c8
>               notes[3]: 1c9e42c
>     NT_PRSTATUS_offset: 1068
>                         11cc
>                         1330
>                         1494
>       offset_eraseinfo: 0 (0x0)
>         size_eraseinfo: 0 (0x0)
>           start_pfn_64: (unused)
>             end_pfn_64: (unused)
>           max_mapnr_64: 1245184 (0x130000)

Can you fetch that in "crash"? If you can, then there's nothing to do on
the qemu side (and I'll have to apologize for spamming a bunch of lists :/).

I think "crash" already iterates over all of the notes in the note
buffer, but skips everything different from NT_PRSTATUS.


(3) Regarding the structure of the notes, we have to consider the
placement of the notes and their internal structure. The placement is
different between the ELF and the KDUMP file format. The internal
structure of the notes is identical between the two file formats.

For example, for a 4 VCPU guest, you end up with note names like

  CORE
  CORE
  CORE
  CORE
  QEMU
  QEMU
  QEMU
  QEMU

All of these are Elf64_Nhdr structures. The CORE ones have type
NT_PRSTATUS, and the QEMU ones have type 0.

(3a) The placement in the ELF file is already handled by "crash". Each
note "simply" gets its own ELF note segment/section.

(3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in
the above example -- 4x CORE, 4x QEMU) are concatenated in that order,
and finally stored at "offset_note".

(3c) Regarding the internal structure of the notes. The CORE ones are
already known and handled. The QEMU notes have the following structure:

> Elf64_Nhdr:
> n_namesz: 5 ("QEMU")
> n_descsz: 432
>   n_type: 0 (?)
>           000001b000000001 0000000000000000
            |------||------| |--------------|
            size    version  rax

>           0000000000000000 0000000000000000
            |--------------| |--------------|
            rbx              rcx

>           0000000000000000 0000000000000001
            |--------------| |--------------|
            rdx              rsi

>           ffffffff81dd5228 ffffffff81a01ec8
            |--------------| |--------------|
            rdi              rsp

>           ffffffff81a01ec8 0000000000000000
            |--------------| |--------------|
            rbp              r8

>           0000000000000000 00000013911d5f29
            |--------------| |--------------|
            r9               r10

>           0000000000000000 ffffffff81c00480
            |--------------| |--------------|
            r11              r12

>           0000000000000000 ffffffffffffffff
            |--------------| |--------------|
            r13              r14

>           000000000309f000 ffffffff810375ab
            |--------------| |--------------|
            r15              rip

>           0000000000000246 ffffffff00000010
            |--------------| |------||------|
            rflags           cs/lim  cs/sel

>           0000000000a09b00 0000000000000000
            |------||------| |--------------|
            cs/pad  cs/flags cs/base

>           ffffffff00000018 0000000000c09300
            |------||------| |------||------|
            ds/lim  ds/sel   ds/pad  ds/flags

>           0000000000000000 ffffffff00000018
            |--------------| |------||------|
            ds/base          es/lim  es/sel

>           0000000000c09300 0000000000000000
            |------||------| |--------------|
            es/pad  es/flags es/base

>           ffffffff00000000 0000000000000000
            |------||------| |------||------|
            fs/lim  fs/sel   fs/pad  fs/flags

>           0000000000000000 ffffffff00000000
            |--------------| |------||------|
            fs/base          gs/lim  gs/sel

>           0000000000000000 ffff880003200000
            |------||------| |--------------|
            gs/pad  gs/flags gs/base

>           ffffffff00000018 0000000000c09300
            |------||------| |------||------|
            ss/lim  ss/sel   ss/pad  ss/flags

>           0000000000000000 ffffffff00000000
            |--------------| |------||------|
            ss/base          ldt...

>           0000000000000000 0000000000000000
            |------||------| |--------------|
                                       ...ldt

>           0000208700000040 0000000000008b00
            |------||------| |------||------|
            tr...

>           ffff880003213b40 0000007f00000000
            |--------------| |------||------|
                       ...tr gdt...

>           0000000000000000 ffff880003204000
            |------||------| |--------------|
                                       ...gdt

>           00000fff00000000 0000000000000000
            |------||------| |------||------|
            idt...

>           ffffffff81dd2000 000000008005003b
            |--------------| |--------------|
                      ...idt cr0

>           0000000000000000 0000000001b2e000
            |--------------| |--------------|
            cr1              cr2

>           0000000007b18000 00000000000006f0
            |--------------| |--------------|
            cr3              cr4

>From "target-i386/arch_dump.c":

> struct QEMUCPUSegment {
>     uint32_t selector;
>     uint32_t limit;
>     uint32_t flags;
>     uint32_t pad;
>     uint64_t base;
> };
>
> typedef struct QEMUCPUSegment QEMUCPUSegment;
>
> struct QEMUCPUState {
>     uint32_t version;
>     uint32_t size;
>     uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
>     uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
>     uint64_t rip, rflags;
>     QEMUCPUSegment cs, ds, es, fs, gs, ss;
>     QEMUCPUSegment ldt, tr, gdt, idt;
>     uint64_t cr[5];
> };
>
> typedef struct QEMUCPUState QEMUCPUState;


Summary: I think the info is all there.

Thanks
Laszlo



More information about the kexec mailing list