[RFCv2 0/9] UEFI emulator for kexec

Mon Sep 9 06:49:40 PDT 2024

Hi Lennart,
Hi Jan,

On Mon, 9 Sep 2024 12:42:45 +0200
Jan Hendrik Farr <kernel at jfarr.cc> wrote:

> On 09 11:48:30, Lennart Poettering wrote:
> > On Fr, 06.09.24 12:54, Philipp Rudo (prudo at redhat.com) wrote:
> >   
> > > I mostly agree on what you have wrote. But I see a big problem in
> > > running the EFI emulator in user space when it comes to secure boot.
> > > The chain of trust ends in the kernel. So it's the kernel that needs to
> > > verify that the image to be loaded can be trusted. But when the EFI
> > > runtime is in user space the kernel simply cannot do that. Which means,
> > > if we want to go this way, we would need to extend the chain of trust
> > > to user space. Which will be a whole bucket of worms, not just a
> > > can.  
> > 
> > May it would be nice to have a way to "zap" userspace away, i.e. allow
> > the kernel to get rid of all processes in some way, reliable. And then
> > simply start a new userspace, from a trusted definition. Or in other
> > words: if you don't want to trust the usual userspace, then let's
> > maybe just terminate it, and create it anew, with a clean, pristine
> > definition the old userspace cannot get access to.  
> 
> Well, this is an interesting idea!
> 
> However, I'm sceptical if this could be done in a secure way. How do we
> ensure that nothing the old userspace did with the various interfaces to
> the kernel has no impact on the new userspace? Maybe others can chime in
> on this? Does kernel_lockdown give more guarantees related to this?
> 
> Even if this is possible in a secure way, there is a problem with doing
> this for kernels that are to be kexec'd on kernel panic. In this
> approach we can't pre-run them until EBS(), so we would rely on the old
> kernel to still be intact when we want to kexec reboot.

I don't believe there's a way to do that on running kernels. As Jan
pointed out, this cannot be done during reboot, as for kdump that would
mean to run after a panic. So it would need to run when the new image
is loaded. But at that time your user space is running. Plus you also
always have a user space component that triggers kexec. So you cannot
simply "zap" user space but have to somehow stash it away, run your
trusted user space and, then restore the old user space again. That
sounds pretty error prone to me. Plus it will tank your performance
every time you do a kexec, which for kdump is every boot...

> You could do a system where you kexec into an intermediate kernel. That
> kernel get's kexec'd with a signed initrd that can use the normal
> kexec_load syscall to load do any kind of preparation in userspace.
> Problem: For that intermediate enviroment we already need a format
> that combines kernel image, initrd, cmdline all signed in one package
> aka UKI. Was it the chicken or the egg?
> 
> But this shows that if we implemented UKIs the easy way (kernel simply
> checks signature, extracts the pieces, and kexecs them like normal),
> this approach could always be used to support kexec for other future
> formats. They could use the kernels UKI support to boot into an
> intermediate kernel with UEFI implemented in userspace in the initrd.
> 
> So basically support UKIs the easy way and use them to be able to
> securely zap away userspace and start with a fresh kernel and signed
> userspace as a way to support other UEFI formats that are not UKI.

Well, in theory that should work. But I see several problems:

1) How does the first kernel tell the intermediate kernel which
file(s) with wich command line to load? In fact, how does the first
kernel get the information itself? You would need a new system call
that takes two kernel images, one for the intermediate and one for the
kernel to load,for that.

Of course you could also build the intermediate UKI during kernel build
and include it into the image. Similar to what is done with the
purgatory. But that would totally bloat the kernel image. 

2) I expect that to be extremely painful to debug, if the intermediate
kernel runs into a panic. For sure kdump won't work in that case...

3) Distros would need maintain and test the additional UKI.

4) This approach basically needs to boot twice. But there are people
out there who fight to reduce boot times extremely hard. For them every
millisecond counts. Telling them that they will need to wait twice as
long will be very hard to sell.

> >   
> > > Let me throw an other wild idea in the ring. Instead of implementing
> > > a EFI runtime we could also include a eBPF version of the stub into the
> > > images. kexec could then extract the eBPF program and let it run just
> > > like any other eBPF program with all the pros (and cons) that come with
> > > it. That won't be as generic as the EFI runtime, e.g. you couldn't
> > > simply kexec any OS installer. On the other hand it would make it
> > > easier to port UKIs et al. to non-EFI systems. What do you think?  
> > 
> > ebpf is not turing complete, I am not sure how far you will make it
> > with this, in the various implementations of EFI payloads there are
> > plenty of loops, sometimes IO loops, sometimes hash loops of huge data
> > (for measurements). As I understand ebpf is not really compatible such
> > code.

I don't believe we can simply take all those payloads and recompile
them to eBPF. There definitely needs to be some refactoring done first.
For example the IO loops you can drop for eBPF and simply map to the
corresponding kernel function, letting them do the full IO in one go.
There will be cases where that will be more difficult like for hash
loops when you have to have the same hash at the end. But I believe
even for that ways could be found to get it to work.

Anyway, I'm sure that the picture I have in my head is way
oversimplified. There will be many pitfalls to handle for sure. Still I
believe it would be a nice experiment.

Thanks
Philipp