kexec-starting-kernel-problem-on-vm

Jingxian He hejingxian at huawei.com
Wed Apr 14 08:04:26 BST 2021


We use ‘kexec –l’ and ‘kexec –e’ on our virtual machine to upgrade the
linux kernel. We find that the new kernel may start fail due to checking
the sha256 sum of the initrd segment checking fail with low probability.

The related code is as following:
/* arch/x86/purgatory/purgatory.c */
static int verify_sha256_digest(void)
{
	struct kexec_sha_region *ptr, *end;
	u8 digest[SHA256_DIGEST_SIZE];
	struct sha256_state sctx;

	sha256_init(&sctx);
	end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);

	for (ptr = purgatory_sha_regions; ptr < end; ptr++)
		sha256_update(&sctx, (uint8_t *)(ptr->start), ptr->len);

	sha256_final(&sctx, digest);

	if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
		return 1;

	return 0;
}

void purgatory(void)
{
	int ret;

	ret = verify_sha256_digest();
	if (ret) { //<------verify_sha256 fail, entering loop forever
		/* loop forever */
		for (;;)
			;
	}
	copy_backup_region();
}


Our opnion of this problem:
We think that the process of relocating the new kernel depending on the
boot cpu running without interruption. However, the vcpus may be interrupted
by the qemu process with async_page_fault interruption.
There exists memory overriding risk when the boot vcpu relocate the new kernel.

When we enable the KVM_GUEST feature, and make the memory less than 500M,
The new kernel starting problem with ‘kexec -l/-e’ will happen at every time.

My last question is:
Are ‘kexec –l’ and ‘kexec –e’ commands not applicable on virtual machines?




More information about the kexec mailing list