Questions about kexec-tools (resend to list)

Tue Mar 7 15:34:38 PST 2017

> On Mar 7, 2017, at 7:53 AM, Pratyush Anand <panand at redhat.com> wrote:
> 
> Hi Philip,
> 
> On Sunday 05 March 2017 04:56 AM, Philip Prindeville wrote:
> 
> [...]
> 
>> 
>> In the case of having a single system kernel binary, then you’d have to install this kernel and it’s modules, and add this kernel to the boot loader configuration files, wouldn’t you?  What do my grub arguments look like?
> 
> Not necessarily all the modules. Kdump kernel will use only minimal modules. You can build your initramfs with a minimum needed module, so that you can boot and copy vmcore.
> 
>> 
>> Do I always load my system kernel with “crashkernel=64M at 16M” per the “CONFIG_PHYSICAL_START” and here:
> 
> In the first kernel you need to pass "crashkernel=". Only size(64M )should also work. Kernel should find the appropriate start address of crash kernel location.
> 
>> 
>> 
>>> 2) Boot the system kernel with the boot parameter "crashkernel=Y at X",
>>> where Y specifies how much memory to reserve for the dump-capture kernel
>>> and X specifies the beginning of this reserved memory. For example,
>>> "crashkernel=64M at 16M" tells the system kernel to reserve 64 MB of memory
>>> starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
>> 
>> 
>> 
>> Okay, we have a 2.6MB /vmlinuz in our /boot partition, so it’s relocatable and this part applies:
>> 
>> 
>>> If you are using a compressed bzImage/vmlinuz, then use following command
>>> to load dump-capture kernel.
>>> 
>>> kexec -p <dump-capture-kernel-bzImage> \
>>> --initrd=<initrd-for-dump-capture-kernel> \
>>> --append="root=<root-dev> <arch-specific-options>"
>> 
>> 
>> 
>> Not sure I understand this part.  So if we have a relocatable kernel with crashdump built-in to our system kernel, do we need to load two kernels, just with different <arch-specific-options> and everything else being the same?
> 
> You are in primary kernel and you need to load crash kernel.
> 
> `kexec -p /boot/vmlinuz --initrd=/boot/kdump-initrd --reuse-cmdline --append="irqpoll maxcpus=1 reset_devices"`  should work.

Tried something like that:

root at PowercodeBMU:/# kexec -p /boot/vmlinuz --reuse-cmdline --append="irqpoll maxcpus=1 reset_devices 1"
Cannot get kernel page_offset_base symbol address
Cannot load /boot/vmlinuz
root at PowercodeBMU:/# 

Not sure why I’m seeing this.

> 
> You need to prepare kdump-initrd, OR you can use current initrd, but that will load all your modules of 1st kernel and 64M might not be sufficient space then.

It’s an embedded system so it’s pretty skinny.  Everything needed to boot is “baked in”.  Everything else gets loaded as a module into the booting kernel via init.d scripts …

> 
>> 
>> Would the <arch-specific-options> be:
>> 
>> crashkernel=64M at 16M 1 irqpoll maxcpus=1 reset_devices
> 
> "crashkernel=" *must* *not* be passed to crash kernel. It is only for the primary kernel.

Okay.  And --reuse-cmdline takes care of stripping that out for you, it looks like.  That option isn’t discussed in Documentation/kdump/ but it might be handy to add something about it.

> 
>> 
>> in that case?
>> 
>> On a normally running system, using an overlay root, our cmdline looks like:
>> 
>> BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd
> 
> So, it should also have crashkernel=64M.

Well, right.  I was talking about a nominal system before I’ve started trying to get it to be crash-dump capable.

> 
>> 
>> so I guess we’d just mash on those extra arguments.  On a running system, our mount points are:
>> 
>> /dev/root on /rom type squashfs (ro,relatime)
>> proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
>> tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
>> tmpfs on /tmp/root type tmpfs (rw,noatime,mode=755)
>> tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
>> devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600)
>> debugfs on /sys/kernel/debug type debugfs (rw,noatime)
>> /dev/mtdblock1 on /overlay type jffs2 (rw,noatime)
>> overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
>> 
>> 
>> but it doesn’t sound like any of that would change (except perhaps mounting a USB thumb-drive if we wanted to copy our crashdump to that device instead).

Ah, actually, that’s not quite right.  /boot has been unmounted early on but we’ll need to keep it mounted (even if we remount it as ‘ro’).

>> 
>> So if I’ve understood, when the first loaded kernel (the system kernel) crashes, kexec will then try the next kernel it sees…  which will be something like:
>> 
>> kexec -p /boot/vmlinuz \
>> 	—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
>> 
>> (we don’t use a initrd as you can see above) and that’s described here:
> 
> OK..so you can exclude --initrd argument to kexec.

Yes.

> 
>> 
>> 
>>> Kernel Panic
>>> ============
>>> 
>>> After successfully loading the dump-capture kernel as previously
>>> described, the system will reboot into the dump-capture kernel if a
>>> system crash is triggered. [snip]
>> 
>> 
>> 
>> assuming the system isn’t so badly hosed that a WDT expires causing a BIOS reset, etc.
>> 
>> Do both kernels use the same “crashdump=“ value, or do they need different base addresses?
> 
> Again, only 1st kernel need "crashkernel=“.

Okay, got it.

> 
>> 
>> And assuming that you’re using the same kernel, etc. how does the init.d scripting on the crashdump (2nd instance of the kernel) know that it’s not the nominal kernel?  Do we use /sys/kernel/kexec_loaded for this purpose?  Or do we just look for the existence of /proc/vmcore?
> 
> Yep, you can find /proc/vmcore in 2nd kernel but not in 1st kernel.
> /sys/kernel/kexec_crash_loaded  should have 1 in 1st kernel while 0 in crash kernel.

So far I’m seeing the opposite:

root at PowercodeBMU:/# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/sda2,65536,rootfs,5 root=/dev/mtdblock0 rootfstype=squashfs rootwait console=tty0 console=ttyS0,115200n8r noinitrd crashkernel=64M
root at PowercodeBMU:/# cat /sys/kernel/kexec_crash_loaded
0
root at PowercodeBMU:/# 

Maybe it’s the other way around?

> 
>> 
>> And then have something in my init.d scripts like:
>> 
>> kexec_loaded=$(< /sys/kernel/kexec_loaded)
> 
> /sys/kernel/kexec_crash_loaded

Right.

> 
>> 
>> if [ “$kexec_loaded” = 0 ]; then
>>  kexec -p /boot/vmlinuz \
>> 	—-append=“$(cat /proc/cmdline) irqpoll maxcpus=1 reset_devices 1”
>> else
>>  echo “*** HANDLING CRASH DUMP COLLECTION"
>>  mkdir -p /mnt/crashdrive
>>  mount LABEL=crashdrive /mnt/crashdrive
>>  # might do something clever here with “df —output=avail -m /mnt/crashdrive” to make
>>  # sure I have enough space for the copy, perhaps deleting older dumps until I do…
>>  cp /proc/vmcore /mnt/crashdrive
>>  sync
>>  umount /mnt/crashdrive
>>  echo “*** NOW REBOOTING"
>>  reboot -f
>> fi
>> 
> 
> Above should work.

Question… will crashkernel being 64M mean that /sys/kernel/kexec_crash_size is also 64M (67108864) and that would also be the size of /proc/vmcore?

> 
> There can be many ways. You can have a look on fedora kexec-tools code.
> http://pkgs.fedoraproject.org/cgit/rpms/kexec-tools.git/
> 
> 
>> Do I need to reboot in a particular way to avoid looping?  The “Kernel Panic” section seems to state that normal reboots won’t be affected.
> 
> When you execute reboot, it will reboot to the 1st kernel through grub (boot loader).

Okay.

Thanks,

-Philip

> 
>> 
>> I appreciate the documentation you’ve written, but it’s a little unclear (to me at least) how to handle the degenerate case of using the same kernel as the system kernel and the crashdump kernel…
>> 
>> I want to make sure that I don’t inadvertently set it up to do looping infinitely nested kernels, etc.
>> 
>> I’m probably overthinking this, but… we’re having crashes in the field and the customers are a little riled up right now so I don’t want to spend a lot of time saying “here try this image”.  They want their smoking gun and they want it soon.
>> 
> 
> 
> ~Pratyush