kexec-starting-kernel-problem-on-vm
David Hildenbrand
david at redhat.com
Mon Apr 19 09:37:46 BST 2021
On 19.04.21 10:26, Baoquan He wrote:
> Hi Jingxian,
>
> On 04/14/21 at 03:04pm, Jingxian He wrote:
>> We use ‘kexec –l’ and ‘kexec –e’ on our virtual machine to upgrade the
>> linux kernel. We find that the new kernel may start fail due to checking
>> the sha256 sum of the initrd segment checking fail with low probability.
>>
>> The related code is as following:
>> /* arch/x86/purgatory/purgatory.c */
>> static int verify_sha256_digest(void)
>> {
>> struct kexec_sha_region *ptr, *end;
>> u8 digest[SHA256_DIGEST_SIZE];
>> struct sha256_state sctx;
>>
>> sha256_init(&sctx);
>> end = purgatory_sha_regions + ARRAY_SIZE(purgatory_sha_regions);
>>
>> for (ptr = purgatory_sha_regions; ptr < end; ptr++)
>> sha256_update(&sctx, (uint8_t *)(ptr->start), ptr->len);
>>
>> sha256_final(&sctx, digest);
>>
>> if (memcmp(digest, purgatory_sha256_digest, sizeof(digest)))
>> return 1;
>>
>> return 0;
>> }
>>
>> void purgatory(void)
>> {
>> int ret;
>>
>> ret = verify_sha256_digest();
>
> I usually use qemu/kvm guest to test kernel, kexec and kdump, haven't
> met this issue. kexec -l/-e works well for me. Seems you are not using
> the latest kexec-tools. Otherwise you can use "-i (--no-checks)" to work
> around this for the time being.
>
>> if (ret) { //<------verify_sha256 fail, entering loop forever
>> /* loop forever */
>> for (;;)
>> ;
>> }
>> copy_backup_region();
>> }
>>
>>
>> Our opnion of this problem:
>> We think that the process of relocating the new kernel depending on the
>> boot cpu running without interruption. However, the vcpus may be interrupted
>> by the qemu process with async_page_fault interruption.
So, are you saying that the host still delivers an AFP to the guest,
even though it has interrupts disabled (including AFP)? Hard to imagine
that this would be the case right now.
Any other host activity that temporarily stops/schedules out the VCPU
should be not relevant to the VM ("vCPU interrupted by the QEMU
process"); if there would be something running inside the VM that
disables interrupts to reduce the size of a race window, that would need
fixing inside the VM.
--
Thanks,
David / dhildenb
More information about the kexec
mailing list