[PATCH v5 0/3] Export offsets of VMCS fields as note information for kdump

Zhang Yanfei zhangyanfei at cn.fujitsu.com
Sun Jul 29 22:53:43 EDT 2012


Hello Avi,

Do you have any comments about this version of the patch set?

于 2012年07月12日 17:54, Zhang Yanfei 写道:
> This patch set exports offsets of VMCS fields as note information for
> kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve
> runtime state of guest machine image, such as registers, in host
> machine's crash dump as VMCS format. The problem is that VMCS internal
> is hidden by Intel in its specification. So, we slove this problem
> by reverse engineering implemented in this patch set. The VMCSINFO
> is exported via sysfs (/sys/devices/system/cpu/vmcs/) to kexec-tools.
> 
> Here are two usercases for two features that we want.
> 
> 1) Create guest machine's crash dumpfile from host machine's crash dumpfile
> 
> In general, we want to use this feature on failure analysis for the system
> where the processing depends on the communication between host and guest
> machines to look into the system from both machines's viewpoints.
> 
> As a concrete situation, consider where there's heartbeat monitoring
> feature on the guest machine's side, where we need to determine in
> which machine side the cause of heartbeat stop lies. In our actual
> experiments, we encountered such situation and we found the cause of
> the bug was in host's process schedular so guest machine's vcpu stopped
> for a long time and then led to heartbeat stop.
> 
> The module that judges heartbeat stop is on guest machine, so we need
> to debug guest machine's data. But if the cause lies in host machine
> side, we need to look into host machine's crash dump.
> 
> Without this feature, we first create guest machine's dump and then
> create host mahine's, but there's only a short time between two
> processings, during which it's unlikely that buggy situation remains.
> 
> So, we think the feature is useful to debug both guest machine's and
> host machine's sides at the same time, and expect we can make failure
> analysis efficiently.
> 
> Of course, we believe this feature is commonly useful on the situation
> where guest machine doesn't work well due to something of host machine's.
> 
> 2) Get offsets of VMCS information on the CPU running on the host machine
> 
> If kdump doesn't work well, then it means we cannot use kvm API to get
> register values of guest machine and they are still left on its vmcs
> region. In the case, we use crash dump mechanism running outside of
> linux kernel, such as sadump, a firmware-based crash dump. Then VMCS
> information is then necessary.
> 
> TODO:
>   1. In kexec-tools, get VMCSINFO via sysfs and dump it as note information
>      into vmcore.
>   2. Dump VMCS region of each guest vcpu and VMCSINFO into qemu-process
>      core file. To do this, we will modify kernel core dumper, gdb gcore
>      and crash gcore.
>   3. Dump guest image from the qemu-process core file into a vmcore.
> 
> Changelog from v4 to v5:
> 1. The VMCSINFO is stored in a two-dimensional array filled with each
>    field's encoding and corresponding offset. So the size of VMCSINFO
>    is much smaller.
> 2. vmcs sysfs file /sys/devices/system/cpu/vmcs_id is moved to
>    /sys/devices/system/cpu/vmcs/id.
> 3. Rewrite the ABI entry for vmcs interface and remove the KernelVersion
>    line.
> 
> Changelog from v3 to v4:
> 1. All the variables and functions are moved to vmcsinfo-intel module.
> 2. Add a new sysfs interface /sys/devices/system/cpu/vmcs_id to export
>    vmcs revision identifier. And origial sysfs interface is changed
>    from /sys/devices/cpu/vmcs to /sys/devices/system/cpu/vmcs. Thanks
>    Greg KH for his helpful comments about sysfs.
> 
> Changelog from v2 to v3:
> 1. New VMCSINFO format.
>    Now the VMCSINFO is mainly made up of an array that contains all vmcs
>    fields' offsets. The offsets aren't encoded because we decode them in
>    the module itself. If some field doesn't exist or its offset cannot be
>    decoded correctly, the offset in the array is just set to zero.
> 2. New sysfs interface and Documentation/ABI entry. 
>    We expose the actual fields in /sys/devices/cpu/vmcs instead of just
>    exporting the address of VMCSINFO in /sys/kernel/vmcsinfo.
>    For example, /sys/devices/cpu/vmcs/0800 contains the offset of
>    GUEST_DS_SELECTOR. 0800 is the encoding of GUEST_DS_SELECTOR.
>    Accordingly, ABI entry in Documentation is changed from sysfs-kernel-vmcsinfo
>    to sysfs-devices-cpu-vmcs.
> 
> Changelog from v1 to v2:
> 1. The VMCSINFO now has a simple binary <field><encoded offset> format,
>    as below:
>      +-------------+--------------------------+
>      | Byte offset | Contents                 |
>      +-------------+--------------------------+
>      | 0           | VMCS revision identifier |
>      +-------------+--------------------------+
>      | 4           | <field><encoded offset>  |
>      +-------------+--------------------------+
>      | 16          | <field><encoded offset>  |
>      +-------------+--------------------------+
>      ......
>   
>    The first 32 bits of VMCSINFO contains the VMCS revision identifier.
>    The remainder of VMCSINFO is used for <field><encoded offset> sets.
>    Each set takes 12 bytes: field occupys 4 bytes and its corresponding
>    encoded offset occupys 8 bytes.
> 
>    Encoded offsets are raw values read by vmcs_read{16, 64, 32, l}, and
>    they are all unsigned extended to 8 bytes for each <field><encoded offset>
>    set will have the same size. 
>    We do not decode offsets here. The decoding work is delayed in userspace
>    tools for more flexible handling.
>    
>    And here are two examples of the new VMCSINFO:
>    Processor: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>    VMCSINFO contains:
>      <0000000d>                   --> VMCS revision id = 0xd
>      <00004000><0000000001840180> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x01840180
>      <00004002><0000000001940190> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x01940190
>      <0000401e><000000000fe40fe0> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x0fe40fe0
>      <0000400c><0000000001e401e0> --> OFFSET(VM_EXIT_CONTROLS) = 0x01e401e0
>      ......
> 
>    Processor: Intel(R) Xeon(R) CPU           E7540  @ 2.00GHz (24 cores)
>    VMCSINFO contains:
>      <0000000e>                   --> VMCS revision id = 0xe 
>      <00004000><0000000005540550> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x05540550
>      <00004002><0000000005440540> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x05440540
>      <0000401e><00000000054c0548> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x054c0548
>      <0000400c><00000000057c0578> --> OFFSET(VM_EXIT_CONTROLS) = 0x057c0578
>      ......
> 
> 2. Add a new kernel module *vmcsinfo-intel* for filling VMCSINFO instead
>    of putting it in module kvm-intel. The new module is auto-loaded
>    when the vmx cpufeature is detected and it depends on module kvm-intel.
>    *Loading and unloading this module will have no side effect on the
>    running guests.*
> 3. The sysfs file vmcsinfo is splitted into 2 files:
>    /sys/kernel/vmcsinfo: shows physical address of VMCSINFO note information.
>    /sys/kernel/vmcsinfo_maxsize: shows max size of VMCSINFO.
> 4. A new Documentation/ABI entry is added for vmcsinfo and vmcsinfo_maxsize.
> 5. Do not update VMCSINFO note when the kernel is panicked.
> 
> zhangyanfei (3):
>   KVM: Export symbols for module vmcsinfo-intel
>   KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO
>   Documentation: Add ABI entry for vmcs sysfs interface.
> 
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   20 +
>  arch/x86/include/asm/vmx.h                         |   73 ++
>  arch/x86/kvm/Kconfig                               |   11 +
>  arch/x86/kvm/Makefile                              |    3 +
>  arch/x86/kvm/vmcsinfo.c                            |  714 ++++++++++++++++++++
>  arch/x86/kvm/vmx.c                                 |   81 +--
>  include/linux/kvm_host.h                           |    3 +
>  virt/kvm/kvm_main.c                                |    8 +-
>  8 files changed, 841 insertions(+), 72 deletions(-)
>  create mode 100644 arch/x86/kvm/vmcsinfo.c




More information about the kexec mailing list