[RFC] proposal: KVM: Orphaned VMs: The Caretaker approach for Live Update

Paolo Bonzini pbonzini at redhat.com
Sun May 3 09:57:06 PDT 2026


On Fri, May 1, 2026 at 11:48 PM Pasha Tatashin
<pasha.tatashin at soleen.com> wrote:
> The way I see it, vmfd and vcpufd need to support LUO preservation by
> implementing the liveupdate_file_ops callbacks (.preserve, .restore).
>
> When userspace preserves the vcpufd, the kernel isolates the assigned
> pCPU (probably requiring the vCPU to be pinned to the CPU, and the CPU
> to be in an isolated cpuset),

Generally speaking vCPUs do not care if they are pinned. Do we need a
preparatory ioctl for this on the vcpufd side, even just a
KVM_ENABLE_CAP or a new bit for
KVM_ENABLE_CAP(KVM_CAP_X86_DISABLE_EXITS)?  If
LIVEUPDATE_SESSION_PRESERVE_FD suddenly can start offlining pCPUs,
that would require a capability check.

> Even for this simplest form, we still need a defined ABI between the
> host kernel and the Caretaker. The host must send an IPI to
> synchronously notify the Caretaker of attach and detach transitions.

Yes, but it's unlikely to need stability unlike the KHO serialization.

> This ABI must also handle Caretaker replacement during the adoption
> phase: when the new kernel retrieves the vCPU, it requires a protocol to
> notify the running Caretaker that the pCPU is being onlined. The
> Caretaker must reach a known, state at that moment so the kernel can
> seamlessly replace the previous kernel's Caretaker version with the
> current one. Finally, the Caretaker still relies on the CCB to access
> KVM routing pointers for forwarding VM exits back to the host during
> normal operation.

I'm not sure if routing pointers are needed, as opposed to just a
call/ret (ignoring detachment and reattachment which are different
anyway). In the end, the x86 caretaker is basically the
non-preemptable part of vcpu_enter_guest(). Even once you add
attach/detach, the body of code that runs in the caretaker is roughly
the same and what changes is setting up the address space, the IDT,
etc.

> > This must be done atomically at the time Linux offlines/onlines a pCPU. The
> > interface from Linux to the caretaker must use some kind of IPI so that the
> > new kernel can force a VMEXIT (if needed) in the caretaker, ask it to
> > serialize the vm state, and pass it down to the new kernel's caretaker.
>
> Yes, agree.

BTW, the same IPI is needed to force a VMEXIT even before kexec, if
the old kernel does anything that breaks the running VM such as
madvise(MADV_FREE).  Should not happen with properly behaving
userspace, but it must be accounted for.

> During the kexec gap (when the host kernel is completely offline), the
> Caretaker acts purely as a producer, writing trace events directly to
> this physical memory block and advancing the ring buffer pointers.
>
> When the new kernel boots and re-adopts the orphaned vCPU, it retrieves
> this memory from KHO and attaches it back to the tracing subsystem.

Yes, this should work with remote trace buffers.

> I agree regarding HLT: during the gap, the simplest approach is to just
> skip the exit and return directly to the VM without attempting to handle
> it.

Or even disable intercepts for HLT/PAUSE/MONITOR/MWAIT.

> I completely agree that the transition into and out of the gap must be
> synchronous. As discussed above, using an IPI is the right approach. For
> entry, the host kernel signals the Caretaker via IPI to ensure it
> reaches a known state before the pCPU is offlined.

And then does INIT/SIPI on the offlined pCPU to re-enter the caretaker
in detached mode.

> For exit, the new
> kernel sends an IPI to the orphaned pCPU to force a VM exit, allowing
> the new kernel to take control, update the Caretaker environment,
> and complete the reattachment.

> > Yeah, I think APIC emulation to some extent must be moved into the VMX/SVM
> > fastpaths.  The good news is that this can be done already as a PoC without
> > needing the whole caretaker and LUO infrastructure.
>
> Moving APIC emulation into the VMX/SVM fastpaths makes sense as a
> standalone effort.

Yup, that's nice to have.

Thanks for the detailed reply, I limited mine to where I wanted your
input as the author of LUO - especially with respect to privilege
separation.

Paolo




More information about the kexec mailing list