[PATCH v2 0/5] Export offsets of VMCS fields as note information for kdump
Yanfei Zhang
zhangyanfei at cn.fujitsu.com
Sun May 20 22:32:13 EDT 2012
于 2012年05月21日 01:43, Avi Kivity 写道:
> On 05/16/2012 10:50 AM, zhangyanfei wrote:
>> This patch set exports offsets of VMCS fields as note information for
>> kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve
>> runtime state of guest machine image, such as registers, in host
>> machine's crash dump as VMCS format. The problem is that VMCS internal
>> is hidden by Intel in its specification. So, we slove this problem
>> by reverse engineering implemented in this patch set. The VMCSINFO
>> is exported via sysfs to kexec-tools just like VMCOREINFO.
>>
>> Here are two usercases for two features that we want.
>>
>> 1) Create guest machine's crash dumpfile from host machine's crash dumpfile
>>
>> In general, we want to use this feature on failure analysis for the system
>> where the processing depends on the communication between host and guest
>> machines to look into the system from both machines's viewpoints.
>>
>> As a concrete situation, consider where there's heartbeat monitoring
>> feature on the guest machine's side, where we need to determine in
>> which machine side the cause of heartbeat stop lies. In our actual
>> experiments, we encountered such situation and we found the cause of
>> the bug was in host's process schedular so guest machine's vcpu stopped
>> for a long time and then led to heartbeat stop.
>>
>> The module that judges heartbeat stop is on guest machine, so we need
>> to debug guest machine's data. But if the cause lies in host machine
>> side, we need to look into host machine's crash dump.
>
> Do you mean, that a heartbeat failure in the guest lead to host panic?
>
> My expectation is that a problem in the guest will cause the guest to
> panic and perhaps produce a dump; the host will remain up.
>
The point is that before our investigation, we didn't know which side
leads to this buggy situation. Maybe a bug in host machine or the guest
machine itself causes a heartbeat failure.
So we want to get both host machine's crash dump and guest machine's
crash dump *at the same time*. Then we could use userspace tools to
get guest machine crash dump from host machine's and analyse them
separately to find which side causes the problem.
>> Without this feature, we first create guest machine's dump and then
>> create host mahine's, but there's only a short time between two
>> processings, during which it's unlikely that buggy situation remains.
>>
>> So, we think the feature is useful to debug both guest machine's and
>> host machine's sides at the same time, and expect we can make failure
>> analysis efficiently.
>>
>> Of course, we believe this feature is commonly useful on the situation
>> where guest machine doesn't work well due to something of host machine's.
>>
>> 2) Get offsets of VMCS information on the CPU running on the host machine
>>
>> If kdump doesn't work well, then it means we cannot use kvm API to get
>> register values of guest machine and they are still left on its vmcs
>> region. In the case, we use crash dump mechanism running outside of
>> linux kernel, such as sadump, a firmware-based crash dump. Then VMCS
>> information is then necessary.
>
> Shouldn't sadump then expose the VMCS offsets? Perhaps bundling them
> into its dump file?
>
Firmware-based crash dump doesn't concern the os running on the machine.
So it will not do any os handling when machine crashes.
Thanks
Zhang Yanfei
More information about the kexec
mailing list