[RFCv2 0/9] UEFI emulator for kexec

Mon Sep 9 07:04:50 PDT 2024

On Mon, 9 Sept 2024 at 15:49, Philipp Rudo <prudo at redhat.com> wrote:
>
> Hi Lennart,
> Hi Jan,
>
> On Mon, 9 Sep 2024 12:42:45 +0200
> Jan Hendrik Farr <kernel at jfarr.cc> wrote:
>
> > On 09 11:48:30, Lennart Poettering wrote:
> > > On Fr, 06.09.24 12:54, Philipp Rudo (prudo at redhat.com) wrote:
> > >
> > > > I mostly agree on what you have wrote. But I see a big problem in
> > > > running the EFI emulator in user space when it comes to secure boot.
> > > > The chain of trust ends in the kernel. So it's the kernel that needs to
> > > > verify that the image to be loaded can be trusted. But when the EFI
> > > > runtime is in user space the kernel simply cannot do that. Which means,
> > > > if we want to go this way, we would need to extend the chain of trust
> > > > to user space. Which will be a whole bucket of worms, not just a
> > > > can.
> > >
> > > May it would be nice to have a way to "zap" userspace away, i.e. allow
> > > the kernel to get rid of all processes in some way, reliable. And then
> > > simply start a new userspace, from a trusted definition. Or in other
> > > words: if you don't want to trust the usual userspace, then let's
> > > maybe just terminate it, and create it anew, with a clean, pristine
> > > definition the old userspace cannot get access to.
> >
> > Well, this is an interesting idea!
> >
> > However, I'm sceptical if this could be done in a secure way. How do we
> > ensure that nothing the old userspace did with the various interfaces to
> > the kernel has no impact on the new userspace? Maybe others can chime in
> > on this? Does kernel_lockdown give more guarantees related to this?
> >
> > Even if this is possible in a secure way, there is a problem with doing
> > this for kernels that are to be kexec'd on kernel panic. In this
> > approach we can't pre-run them until EBS(), so we would rely on the old
> > kernel to still be intact when we want to kexec reboot.
>
> I don't believe there's a way to do that on running kernels. As Jan
> pointed out, this cannot be done during reboot, as for kdump that would
> mean to run after a panic. So it would need to run when the new image
> is loaded. But at that time your user space is running. Plus you also
> always have a user space component that triggers kexec. So you cannot
> simply "zap" user space but have to somehow stash it away, run your
> trusted user space and, then restore the old user space again. That
> sounds pretty error prone to me. Plus it will tank your performance
> every time you do a kexec, which for kdump is every boot...
>

kdump has a kexec kernel 'standby' to launch when the kernel panics.
So for the UKI/EFI payload case, this would imply that the load
involves running the payload until EBS() and freezing the state.

Whether execution occurs in true user space or in a deprivileged
kernel context is an implementation detail, imho. We don't want to run
external code in privileged mode inside the kernel in any case, as
this would violate lockdown already. But it should be feasible to have
a EFI compatible layer in the kernel that invokes the EFI entrypoint
of an image in a way that protects the host kernel. This could be user
mode on the CPU or perhaps a minimal KVM virtual machine.

The advantage of this approach is that the whole concept of purgatory
can be avoided - the EFI boot phase runs in parallel with the previous
kernel, which has full control over authentication and [emulated] PCR
externsion, and has ultimate control over whether the kexec reboot is
permitted.

> > You could do a system where you kexec into an intermediate kernel. That
> > kernel get's kexec'd with a signed initrd that can use the normal
> > kexec_load syscall to load do any kind of preparation in userspace.
> > Problem: For that intermediate enviroment we already need a format
> > that combines kernel image, initrd, cmdline all signed in one package
> > aka UKI. Was it the chicken or the egg?
> >
> > But this shows that if we implemented UKIs the easy way (kernel simply
> > checks signature, extracts the pieces, and kexecs them like normal),
> > this approach could always be used to support kexec for other future
> > formats. They could use the kernels UKI support to boot into an
> > intermediate kernel with UEFI implemented in userspace in the initrd.
> >
> > So basically support UKIs the easy way and use them to be able to
> > securely zap away userspace and start with a fresh kernel and signed
> > userspace as a way to support other UEFI formats that are not UKI.
>
> Well, in theory that should work. But I see several problems:
>
> 1) How does the first kernel tell the intermediate kernel which
> file(s) with wich command line to load? In fact, how does the first
> kernel get the information itself? You would need a new system call
> that takes two kernel images, one for the intermediate and one for the
> kernel to load,for that.
>
> Of course you could also build the intermediate UKI during kernel build
> and include it into the image. Similar to what is done with the
> purgatory. But that would totally bloat the kernel image.
>
> 2) I expect that to be extremely painful to debug, if the intermediate
> kernel runs into a panic. For sure kdump won't work in that case...
>
> 3) Distros would need maintain and test the additional UKI.
>
> 4) This approach basically needs to boot twice. But there are people
> out there who fight to reduce boot times extremely hard. For them every
> millisecond counts. Telling them that they will need to wait twice as
> long will be very hard to sell.
>

I don't think intermediate kernels are the solution here. We need to
run as much as possible under the control of the preceding kernel, and
minimize the bare metal handover that occurs after EBS(). Adding more
code to the purgatory (as this series does) is not acceptable to me,
as it is extremely difficult to debug, and duplicates drivers and
other logic (making it an 'intermediate kernel' of sorts already)

> > >
> > > > Let me throw an other wild idea in the ring. Instead of implementing
> > > > a EFI runtime we could also include a eBPF version of the stub into the
> > > > images. kexec could then extract the eBPF program and let it run just
> > > > like any other eBPF program with all the pros (and cons) that come with
> > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't
> > > > simply kexec any OS installer. On the other hand it would make it
> > > > easier to port UKIs et al. to non-EFI systems. What do you think?
> > >
> > > ebpf is not turing complete, I am not sure how far you will make it
> > > with this, in the various implementations of EFI payloads there are
> > > plenty of loops, sometimes IO loops, sometimes hash loops of huge data
> > > (for measurements). As I understand ebpf is not really compatible such
> > > code.
>
> I don't believe we can simply take all those payloads and recompile
> them to eBPF. There definitely needs to be some refactoring done first.
> For example the IO loops you can drop for eBPF and simply map to the
> corresponding kernel function, letting them do the full IO in one go.
> There will be cases where that will be more difficult like for hash
> loops when you have to have the same hash at the end. But I believe
> even for that ways could be found to get it to work.
>
> Anyway, I'm sure that the picture I have in my head is way
> oversimplified. There will be many pitfalls to handle for sure. Still I
> believe it would be a nice experiment.
>

Today, UKI functionality is implemented in terms of EFI API calls. Any
solution that needs either a parallel implementation (eBPF vs EFI) or
needs to unpack the UKI in order to perform the steps that the UKI
would perform itself if it were executed in an EFI environment is a
no-go in my opinion.

So either we provide some EFI compatible runtime sufficient to run a
UKI, or we re-engineer UKI to be built on top of an abstraction that
can be implemented straight-forwardly both on system firmware and in
the EFI context.