kexec/kdump of a kvm guest?
vgoyal at redhat.com
Thu Jul 24 21:12:06 EDT 2008
On Thu, Jul 24, 2008 at 03:03:33PM -0400, Mike Snitzer wrote:
> On Thu, Jul 24, 2008 at 9:15 AM, Vivek Goyal <vgoyal at redhat.com> wrote:
> > On Thu, Jul 24, 2008 at 07:49:59AM -0400, Mike Snitzer wrote:
> >> On Thu, Jul 24, 2008 at 4:39 AM, Alexander Graf <agraf at suse.de> wrote:
> >> > As you're stating that the host kernel breaks with kvm modules loaded, maybe
> >> > someone there could give a hint.
> >> OK, I can try using a newer kernel on the host too (e.g. 2.6.25.x) to
> >> see how kexec/kdump of the host fairs when kvm modules are loaded.
> >> On the guest side of things, as I mentioned in my original post,
> >> kexec/kdump wouldn't work within a 126.96.36.199 guest with the host
> >> running 188.8.131.52 (with kvm-70).
> > Hi Mike,
> > I have never tried kexec/kdump inside a kvm guest. So I don't know if
> > historically they have been working or not.
> Avi indicated he seems to remember that at least kexec worked last he
> tried (didn't provide when/what he tried though).
> > Having said that, Why do we need kdump to work inside the guest? In this
> > case qemu should be knowing about the memory of guest kernel and should
> > be able to capture a kernel crash dump? I am not sure if qemu already does
> > that. If not, then probably we should think about it?
> > To me, kdump is a good solution for baremetal but not for virtualized
> > environment where we already have another piece of software running which
> > can do the job for us. We will end up wasting memory in every instance
> > of guest (memory reserved for kdump kernel in every guest).
> I haven't looked into what mechanics qemu provides for collecting the
> entire guest memory image; I'll dig deeper at some point. It seems
> the libvirt mid-layer ("virsh dump" - dump the core of a domain to a
> file for analysis) doesn't support saving a kvm guest core:
> # virsh dump guest10 guest10.dump
> libvir: error : this function is not supported by the hypervisor:
> error: Failed to core dump domain guest10 to guest10.dump
> Seems that libvirt functionality isn't available yet with kvm (I'm
> using libvirt 0.4.2, I'll give libvirt 0.4.4 a try). cc'ing the
> libvirt-list to get their insight.
> That aside, having the crash dump collection be multi-phased really
> isn't workable (that is if it requires a crashed guest to be manually
> saved after the fact). The host system _could_ be rebooted; whereby
> losing the guest's core image. So automating qemu and/or libvirtd to
> trigger a dump would seem worthwhile (maybe its already done?).
That's a good point. Ideally, one would like dump to be captured
automatically if kernel crashes and then reboot back to production
kernel. I am not sure what can we do to let qemu know after crash
so that it can automatically save dump.
What happens in the case of xen guests. Is dump automatically captured
or one has to force the dump capture externally.
> So while I agree with you its ideal to not have to waste memory in
> each guest for the purposes of kdump; if users want to model a guest
> image as closely as possible to what will be deployed on bare metal it
> really would be ideal to support a 1:1 functional equivalent with kvm.
Agreed. Making kdump work inside kvm guest does not harm.
> I work with people who refuse to use kvm because of the lack of
> kexec/kdump support.
> I can do further research but welcome others' insight: do others have
> advice on how best to collect a crashed kvm guest's core?
> > It will be interesting to look at your results with 2.6.25.x kernels with
> > kvm module inserted. Currently I can't think what can possibly be wrong.
> If the host's 184.108.40.206 kernel has both the kvm and kvm-intel modules
> loaded kexec/kdump does _not_ work (simply hangs the system). If I
> only have the kvm module loaded kexec/kdump works as expected
> (likewise if no kvm modules are loaded at all). So it would appear
> that kvm-intel and kexec are definitely mutually exclusive at the
> moment (at least on both 2.6.22.x and 2.6.25.x).
Ok. So first task is to fix host kexec/kdump with kvm-intel module
Can you do little debugging to find out where system hangs. I generally
try few things for kexec related issue debugging.
1. Specify earlyprintk= parameter for second kernel and see if control
is reaching to second kernel.
2. Otherwise specify --console-serial parameter on "kexec -l" commandline
and it should display a message "I am in purgatory" on serial console.
This will just mean that control has reached at least till purgatory.
3. If that also does not work, then most likely first kernel itself got
stuck somewhere and we need to put some printks in first kernel to find
out what's wrong.
More information about the kexec