kdump and SMP system kernel

dmitry.krivenok at emc.com dmitry.krivenok at emc.com
Mon Aug 15 10:56:17 EDT 2011


I tried nr_cpus=1, but it didn't help.

I haven't tried the test on bare metal, but I'm going to try it 
on my laptop later today. I'll let you know if it works.

Thanks,
Dmitry

-----Original Message-----
From: Vivek Goyal [mailto:vgoyal at redhat.com] 
Sent: Monday, August 15, 2011 6:41 PM
To: Krivenok, Dmitry
Cc: Kexec Mailing List
Subject: Re: kdump and SMP system kernel

On Mon, Aug 15, 2011 at 10:27:30AM -0400, dmitry.krivenok at emc.com wrote:
> Hello Vivek and Maneesh,
> I've read your document Documentation/kdump/kdump.txt and built system and dump-capture
> kernels with the options mentioned there.
> 
> Then I booted the new system kernel and registered a "panic handler" using the following command
> kexec -p /boot/linux-3.0.0-capture --initrd=/boot/initrd-3.0.0-capture --append="root=/dev/mapper/myvg-root 3 irqpoll maxcpus=1 reset_devices"
> 
> Finally, I simulated a panic using
> echo c > /proc/sysrq-trigger
> 
> Unfortunately, the dump-capture kernel wasn't functional (it was booting very slowly, I saw lots of messages
> like "ata2: lost interrupt", my keyboard didn't work at all and I couldn't access the system via the network).
> 
> I investigated this problem and tried lots of combinations of boot parameters for dump-capture kernel, but
> nothing helped. Then I tried to tune boot parameters of system kernel and found that if I specify "maxcpus=1"
> for system kernel, then dump-capture kernel always boots successfully and I have access to correct /proc/vmcore.
> 
> The problem is that I'm debugging a problem which only occurs on SMP kernel and I never see it on the kernel
> booted with "maxcpus=1".
> 
> So I just want to clarify - is it possible to use kexec/kdump with SMP system kernel?
> Is it intended to work at all?
> 
> Thanks in advance!
> 
> P.S.
> I'm using Arch Linux with vanilla kernel 3.0.0 and kexec-tools 2.0.2-3 running in VM on VmWare ESX server.

Yes it is supposed to work on SMP machines. maxcpus=1 in second kernel 
will make sure that it brings up only the cpu we crashed on. You can
also try using nr_cpus=1 on latest kernels.

It sounds like an issue with disk driver initialization and could have
something do to with hypervisor also. Not sure. Does it work on bare
metal.

P.S. Maneesh is no more with IBM so above id is not valid. I am not sure
what's new id. Some of these issues you can copy on kexec-tools mailing
list. I am ccing the list now.


Thanks
Vivek




More information about the kexec mailing list