[RFCv2 0/9] UEFI emulator for kexec

Ard Biesheuvel ardb at kernel.org
Wed Aug 28 10:08:14 PDT 2024


On Mon, 19 Aug 2024 at 16:55, Pingfan Liu <piliu at redhat.com> wrote:
>
> *** Background ***
>
> As more PE format kernel images are introduced, it post challenge to kexec to
> cope with the new format.
>
> In my attempt to add support for arm64 zboot image in the kernel [1],
> Ard suggested using an emulator to tackle this issue.  Last year, when
> Jan tried to introduce UKI support in the kernel [2], Ard mentioned the
> emulator approach again [3]
>
> After discussion, Ard's approach seems to be a more promising solution
> to handle PE format kernels once and for all.  This series follows that
> approach and implements an emulator to emulate EFI boot time services,
> allowing the efistub kernel to self-extract and boot.
>
> Another year has passed, and UKI kernel is more and more frequently used
> in product. I think it is time to pay effort to resolve this issue.
>
>
> *** Overview of implement ***
> The whole model consits of three parts:
>
> -1. The emulator
> It is a self-relocatable PIC code, which is finally linked into kernel, but not
> export any internal symbol to kernel.  It mainly contains: a PE file parser,
> which loads PE format kernel, a group of functions to emulate efi boot service.
>
> -2. inside kernel, PE-format loader
> Its main task is to set up two extra kexec_segment, one for emulator, the other
> for passing information from the first kernel to emulator.
>
> -3. set up identity mapping only for the memory used by the emulator.
> Here it relies on kimage_alloc_control_pages() to get pages, which will not
> stamped during the process of kexec relocate (cp from src to dst). And since the
> mapping only covers a small range of memory, it cost small amount memory.
>
>
> *** To do ***
>
> Currently, it only works on arm64 virt machine. For x86, it needs some slightly
> changes. (I plan to do it in the next version)
>
> Also, this series does not implement a memory allocator, which I plan to
> implement with the help of bitmap.
>
> About console, currently it hard code for arm64 virt machine, later it should
> extract the information through ACPI table.
>
> For kdump code, it is not implmented yet. But it should share the majority of
> this series.
>
>
> *** Test of this series ***
> I have tested this series on arm64 virt machine. There I booted the vmlinuz.efi
> and kexec_file_load a UKI image, then switch to the second kernel.
>
> I used a modified kexec-tools [4], which just skips the check of the file format and passes the file directly to kernel.
>
> [1]: https://lore.kernel.org/linux-arm-kernel/ZBvKSis+dfnqa+Vz@piliu.users.ipa.redhat.com/T/#m42abb0ad3c10126b8b3bfae8a596deb707d6f76e
> [2]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/
> [3]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/#mc60aa591cb7616ceb39e1c98f352383f9ba6e985
> [4]: https://github.com/pfliu/kexec-tools.git branch: kexec_uefi_emulator
>
>
> RFCv1 -> RFCv2:
> -1.Support to run UKI kernel by: add LoadImage() and StartImage(), add
>    PE file relocation support, add InstallMultiProtocol()
> -2.Also set up idmap for EFI runtime memory descriptor since UKI's
>    systemd-stub calls runtime service
> -3.Move kexec_pe_image.c from arch/arm64/kernel to kernel/, since it
>    aims to provide a more general architecture support.
>
> RFCv1: https://lore.kernel.org/linux-efi/20240718085759.13247-1-piliu@redhat.com/
> RFCv2: https://github.com/pfliu/linux.git  branch kexec_uefi_emulator_RFCv2
>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Jan Hendrik Farr <kernel at jfarr.cc>
> Cc: Philipp Rudo <prudo at redhat.com>
> Cc: Lennart Poettering <mzxreary at 0pointer.de>
> Cc: Jarkko Sakkinen <jarkko at kernel.org>
> Cc: Eric Biederman <ebiederm at xmission.com>
> Cc: Baoquan He <bhe at redhat.com>
> Cc: Dave Young <dyoung at redhat.com>
> Cc: Mark Rutland <mark.rutland at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: kexec at lists.infradead.org
> Cc: linux-efi at vger.kernel.org
> Cc: linux-kernel at vger.kernel.org
>
>
>
> Pingfan Liu (9):
>   efi/libstub: Ask efi_random_alloc() to skip unusable memory
>   efi/libstub: Complete efi_simple_text_output_protocol
>   efi/emulator: Initial rountines to emulate EFI boot time service
>   efi/emulator: Turn on mmu for arm64
>   kexec: Introduce kexec_pe_image to parse and load PE file
>   arm64: kexec: Introduce a new member param_mem to kimage_arch
>   arm64: mm: Change to prototype of
>   arm64: kexec: Prepare page table for emulator
>   arm64: kexec: Enable kexec_pe_image
>

Thanks for putting this RFC together. This is useful work, and gives
us food for thought and discussion.

There are a few problems that become apparent when going through these changes.

1. Implementing UEFI entirely is intractable, and unnecessary.
Implementing the subset of UEFI that is actually needed to boot Linux
*is* tractable, though, but we need to work together to write this
down somewhere.
  - the EFI stub needs the boot services for the EFI memory map and
the allocation routines
  - GRUB needs block I/O
  - systemd-stub/UKI needs file I/O to look for sidecars
  - etc etc

I implemented a Rust 'efiloader' crate a while ago that encapsulates
most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in
user space **). Adding file I/O to this should be straight-forward -
as Lennart points out, we only need the protocol, it doesn't need to
be backed by an actual file system, it just needs to be able to expose
other files in the right way.

2. Running the UEFI emulator on bare metal is not going to scale.
Cloning UART driver code and MMU code etc is a can of worms that you
want to leave closed. And as Lennart points out, there is other
hardware (TPM) that needs to be accessible as well. Providing a
separate set of drivers for all hardware that the EFI emulator may
need to access is not a tractable problem either.

The fix for this, as I see it, is to run the EFI emulator in user
space, to the point where the payload calls ExitBootServices(). This
will allow all I/O and memory protocol to be implemented trivially,
using C library routines. I have a crude prototype** of this running
to the point where ExitBootServices() is called (and then it crashes).
The tricky yet interesting bit here is how we migrate a chunk of user
space memory to the bare metal context that will be created by the
kexec syscall later (in which the call to ExitBootServices() would
return and proceed with the boot). But the principle is rather
straight-forward, and would permit us, e.g., to kexec an OS installer
too.

3. We need to figure out how to support TPM and PCRs in the context of
kexec. This is a fundamental issue with verified boot, given that the
kexec PCR state is necessarily different from the boot state, and so
we cannot reuse the TPM directly if we want to pretend that we are
doing an ordinary boot in kexec. The alternative is to leave the TPM
in a state where the kexec kernel can access its sealed secrets, and
mock up the TCG2 EFI protocols using a shim that sits between the TPM
hardware (as the real TCG2 protocols will be long gone) and the EFI
payload. But as I said, this is a fundamental issue, as the ability to
pretend that a kexec boot is a pristine boot would mean that verified
boot is broken.


As future work, I'd like to propose to collaborate on some alignment
regarding a UEFI baseline for Linux, i.e., the parts that we actually
need to boot Linux.

For this series in particular, I don't see a way forward where we
adopt this approach, and carry all this code inside the kernel.

Thanks.
Ard.



More information about the kexec mailing list