uniquely identifying KDUMP files that originate from QEMU

Dave Anderson anderson at redhat.com
Wed Nov 12 12:41:48 PST 2014



----- Original Message -----
> adding back a few CC's because this discussion is useful
> 
> On 11/12/14 19:43, Petr Tesarik wrote:
> > V Wed, 12 Nov 2014 15:50:32 +0100
> > Laszlo Ersek <lersek at redhat.com> napsáno:
> > 
> >> On 11/12/14 09:04, Petr Tesarik wrote:
> >>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST)
> >>> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
> >>
> >>>> Anyway, phys_base is kernel information. To make it available for qemu
> >>>> side, there's need to prepare a mechanism for qemu to have any access
> >>>> to it.
> >>>
> >>> Yes. I wonder if you can have access without some sort of co-operation
> >>> from the guest kernel itself. I guess not.
> >>
> >> Propagating any kind of additional information from the guest kernel
> >> (which is unprivileged and potentially malicious) to the host-side qemu
> >> process (which is by definition more privileged, although still confined
> >> by various measures) is something we'd explicitly like to avoid.
> >>
> >> Think of it like this. I throw a physical box at you, running Linux,
> >> that has frozen in time. Can "crash" work with nothing else but the
> >> contents of the memory, and information about the CPUs?
> > 
> > If only you could save the _complete_ state of the CPU... For example
> > the content of CR3 would be quite useful.
> 
> (1) CR3 is already saved, in both the ELF and the kdump compressed formats.
> 
> - ELF case:
> 
> qmp_dump_guest_memory() [dump.c]
>   create_vmcore()
>     dump_begin()
>       write_elf64_notes()
> 
>         loop from 1 to #vcpu:
>           cpu_write_elf64_note() [qom/cpu.c]
>             x86_64_write_elf64_note() [target-i386/arch_dump.c]
>               writes "CORE"
> 
>         loop from 1 to #vcpu:
>           cpu_write_elf64_qemunote() [qom/cpu.c]
>             x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c]
>               cpu_write_qemu_note()
>                 qemu_get_cpustate()
>                   s->cr[3] = env->cr[3]; <---------- here
>                 writes "QEMU"
> 
> Hence, the information is part of the QEMU note.
> 
> - kdump case:
> 
> qmp_dump_guest_memory() [dump.c]
>   create_kdump_vmcore()
>     write_dump_header()
>       create_header64()
>         write_elf64_notes()
>           [... same as above ...]
> 
> The trick here is that the note-writer functions use a callback function
> for actually outputting the data. So while in the ELF case the stuff
> goes directly to a file, in the kdump case the notes are first saved in
> a memory buffer, and then later saved in the file at offset
> KdumpSubHeader64.offset_note. (... Which is then represented in the
> flattened file format of course.)
> 
> So, the information is there in both cases.
> 
> 
> (2) Dave -- this just made me realize that the QEMU note is *already*
> there in the kdump file as well; pointed-to by
> KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size.
> 
> From your other email
> <http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>:
> 
> >     sub_header_kdump: 1c9cff0
> >              phys_base: 0
> >             dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
> >                  split: 0
> >              start_pfn: (unused)
> >                end_pfn: (unused)
> >      offset_vmcoreinfo: 0 (0x0)
> >        size_vmcoreinfo: 0 (0x0)
> >            offset_note: 4200 (0x1068)       <----------- here
> >              size_note: 3232 (0xca0)        <-----------
> >     num_prstatus_notes: 4
> >              notes_buf: 1c9e000
> >               notes[0]: 1c9e000
> >               notes[1]: 1c9e164
> >               notes[2]: 1c9e2c8
> >               notes[3]: 1c9e42c
> >     NT_PRSTATUS_offset: 1068
> >                         11cc
> >                         1330
> >                         1494
> >       offset_eraseinfo: 0 (0x0)
> >         size_eraseinfo: 0 (0x0)
> >           start_pfn_64: (unused)
> >             end_pfn_64: (unused)
> >           max_mapnr_64: 1245184 (0x130000)
> 
> Can you fetch that in "crash"? If you can, then there's nothing to do on
> the qemu side (and I'll have to apologize for spamming a bunch of lists :/).

Sure enough...

I was just playing with process_el64_notes() to check/read the note name strings,
and noticed that I can certainly see them.  But as you noted, only the NT_PRSTATUS
notes are stored in the "notes[]" array. so I was under the impression that the
QEMU notes were completely missing.

That being the case -- we're pretty much done!

I'll put a patch in the next upstream release of crash.

Thanks,
  Dave




> 
> I think "crash" already iterates over all of the notes in the note
> buffer, but skips everything different from NT_PRSTATUS.
> 
> 
> (3) Regarding the structure of the notes, we have to consider the
> placement of the notes and their internal structure. The placement is
> different between the ELF and the KDUMP file format. The internal
> structure of the notes is identical between the two file formats.
> 
> For example, for a 4 VCPU guest, you end up with note names like
> 
>   CORE
>   CORE
>   CORE
>   CORE
>   QEMU
>   QEMU
>   QEMU
>   QEMU
> 
> All of these are Elf64_Nhdr structures. The CORE ones have type
> NT_PRSTATUS, and the QEMU ones have type 0.
> 
> (3a) The placement in the ELF file is already handled by "crash". Each
> note "simply" gets its own ELF note segment/section.
> 
> (3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in
> the above example -- 4x CORE, 4x QEMU) are concatenated in that order,
> and finally stored at "offset_note".
> 
> (3c) Regarding the internal structure of the notes. The CORE ones are
> already known and handled. The QEMU notes have the following structure:
> 
> > Elf64_Nhdr:
> > n_namesz: 5 ("QEMU")
> > n_descsz: 432
> >   n_type: 0 (?)
> >           000001b000000001 0000000000000000
>             |------||------| |--------------|
>             size    version  rax
> 
> >           0000000000000000 0000000000000000
>             |--------------| |--------------|
>             rbx              rcx
> 
> >           0000000000000000 0000000000000001
>             |--------------| |--------------|
>             rdx              rsi
> 
> >           ffffffff81dd5228 ffffffff81a01ec8
>             |--------------| |--------------|
>             rdi              rsp
> 
> >           ffffffff81a01ec8 0000000000000000
>             |--------------| |--------------|
>             rbp              r8
> 
> >           0000000000000000 00000013911d5f29
>             |--------------| |--------------|
>             r9               r10
> 
> >           0000000000000000 ffffffff81c00480
>             |--------------| |--------------|
>             r11              r12
> 
> >           0000000000000000 ffffffffffffffff
>             |--------------| |--------------|
>             r13              r14
> 
> >           000000000309f000 ffffffff810375ab
>             |--------------| |--------------|
>             r15              rip
> 
> >           0000000000000246 ffffffff00000010
>             |--------------| |------||------|
>             rflags           cs/lim  cs/sel
> 
> >           0000000000a09b00 0000000000000000
>             |------||------| |--------------|
>             cs/pad  cs/flags cs/base
> 
> >           ffffffff00000018 0000000000c09300
>             |------||------| |------||------|
>             ds/lim  ds/sel   ds/pad  ds/flags
> 
> >           0000000000000000 ffffffff00000018
>             |--------------| |------||------|
>             ds/base          es/lim  es/sel
> 
> >           0000000000c09300 0000000000000000
>             |------||------| |--------------|
>             es/pad  es/flags es/base
> 
> >           ffffffff00000000 0000000000000000
>             |------||------| |------||------|
>             fs/lim  fs/sel   fs/pad  fs/flags
> 
> >           0000000000000000 ffffffff00000000
>             |--------------| |------||------|
>             fs/base          gs/lim  gs/sel
> 
> >           0000000000000000 ffff880003200000
>             |------||------| |--------------|
>             gs/pad  gs/flags gs/base
> 
> >           ffffffff00000018 0000000000c09300
>             |------||------| |------||------|
>             ss/lim  ss/sel   ss/pad  ss/flags
> 
> >           0000000000000000 ffffffff00000000
>             |--------------| |------||------|
>             ss/base          ldt...
> 
> >           0000000000000000 0000000000000000
>             |------||------| |--------------|
>                                        ...ldt
> 
> >           0000208700000040 0000000000008b00
>             |------||------| |------||------|
>             tr...
> 
> >           ffff880003213b40 0000007f00000000
>             |--------------| |------||------|
>                        ...tr gdt...
> 
> >           0000000000000000 ffff880003204000
>             |------||------| |--------------|
>                                        ...gdt
> 
> >           00000fff00000000 0000000000000000
>             |------||------| |------||------|
>             idt...
> 
> >           ffffffff81dd2000 000000008005003b
>             |--------------| |--------------|
>                       ...idt cr0
> 
> >           0000000000000000 0000000001b2e000
>             |--------------| |--------------|
>             cr1              cr2
> 
> >           0000000007b18000 00000000000006f0
>             |--------------| |--------------|
>             cr3              cr4
> 
> From "target-i386/arch_dump.c":
> 
> > struct QEMUCPUSegment {
> >     uint32_t selector;
> >     uint32_t limit;
> >     uint32_t flags;
> >     uint32_t pad;
> >     uint64_t base;
> > };
> >
> > typedef struct QEMUCPUSegment QEMUCPUSegment;
> >
> > struct QEMUCPUState {
> >     uint32_t version;
> >     uint32_t size;
> >     uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
> >     uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
> >     uint64_t rip, rflags;
> >     QEMUCPUSegment cs, ds, es, fs, gs, ss;
> >     QEMUCPUSegment ldt, tr, gdt, idt;
> >     uint64_t cr[5];
> > };
> >
> > typedef struct QEMUCPUState QEMUCPUState;
> 
> 
> Summary: I think the info is all there.
> 
> Thanks
> Laszlo
> 



More information about the kexec mailing list