[Hypervisor Live Update] Notes from May 19, 2025

David Rientjes rientjes at google.com
Sat May 31 20:16:14 PDT 2025


Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, May 19.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Pasha started off by presenting material on the LUO v2 design[1].  He
noted that while we want to eventually preserve devices, the current
proposal only preserves file descriptors.  He also made a comment that
future use cases may allow for extensions for preserving containers
across kexec.

Pasha went through the state machine for LUO: normal, prepared (VMs still
running but devices are serialized), frozen (VM is suspended), and
updated (have updated into the next kernel but are not yet in the normal
cycle on the other side of the kexec).

There are four LUO event stages: prepare (before blackout), freeze (done
normally through reboot syscall), finish (transition from updated state to
normal state), and cancel (for prepare or freeze).

The UAPI includes /dev/liveupdate character device to add participants,
sned events, and query state, as well as /sys/kernel/liveupdate/state that
describes the current state of the system per the above.  The latter can
be used by systemd to optimize for boot or live update; live updates
always want very fast boot.

LUO will register subsystem and can allow for querying of subsystem data.
Each struct liveupdate_subsystem has the callbacks for the event stages
and pass in a u64 pointer.  David Matlack asked if the u64 is normally
used to store the address in memory of saved state; Pasha confirmed this
would normally be the case but LUO doesn't put any restrictions on its 
usage.  Mike Rapoport noted the lower 12 bits could be used for other
purposes if used for an address.

Filesystems can also register with LUO to preserve their fds.  The
struct liveupdate_filesystem includes callbacks for the event stages as
well as a ->retrieve() and ->can_preserve() boolean.  The latter will
allow us to determine if a file can be preserved or not.  These callbacks
all operate on pointers to struct file.

Pratyush asked about the relationship between KHO and LUO.  Pasha noted
that KHO provides a state machine and in RFC v2 of LUO, LUO can drive KHO
which makes the KHO debugfs interface optional.  KHO activate will cause
LUO to switch to the prepared phase, for example.  /dev/liveupdate
continues to be the preferred mechanism.  Think of KHO as preserving state
across kexec whereas LUO provides the state machine.

----->o-----
I asked about the next steps for LUO.  Pasha noted v2 was very recent and
there would be discussion over the next few weeks.  Memfd preservation is
currently under development and Chris Li is working on device
preservation.

Pratyush is working on tests for memfd as well as libluo which is a
userspace library to making interacting with LUO simpler.

Mike had a thought about more tightly coupling KHO and LUO in the kernel
tree.  Pasha suggested waiting for later to do a clean up of the code, at
least waiting for KHO to land (not necessarily LUO landing).  In the
future, it may be better to store under kernel/ instead of drivers/misc/.

No additional work is being planned on KHO until it initially lands.

----->o-----
David Matlack brought up the idea of a live update microconference for LPC
this year.  Once submitted and accepted, then the microconference will
proposed its own CFP per Mike Rapoport and then people can submit for
that.  Pasha noted that this could even include people working on boot
time optimizations that may not be aware that their work is useful for
live update.

----->o-----
I asked about current status of work to split pmem regions into smaller
shards.

We briefly chatted about defaulting dax regions per a specification on
the kernel command line.  Mike had a similar approach in the past for
pmem on top of e820 with namespaces but it was not sent upstream.  Pasha
said the current approach being worked on is that the kernel command line
would specify what should be fsdax and what should be devdax.  The big
change, however, is to eliminate the first 2MB of data for the superblock.
Mike's approach was to move labels to the very end of the device in the
last 128KB.

Pasha said it was likely best to not have the pmem label at all.  Mike
said it was needed to resize namespaces on the pmem devices itself.  Mike
asked to be cc'd on the patches when they go upstream.

----->o-----
David Matlack noted that he was almost finished with VFIO selftests for
6.15 and that it would be sent out.  This was planned to be used for
testing device preservation in an automated way.

Pasha noted there was no good way to create qemu instances in selftests
today, so we need infrastructure for KHO and LUO in selftests.  This will
likely require a significant amount of work.

Mike noted that it may be possible to borrow the infrastructure that BPF 
uses for this.

----->o-----
Next meeting will be on Monday, June 2 at 8am PDT (UTC-7), everybody is
welcome: https://meet.google.com/rjn-dmzu-hgq

Topics for the next meeting:

 - discuss current feedback on LUO v2 and its next steps
 - check on status of memfd preservation using LUO
 - check on status of libluo development from Pratyush
 - check on status of sharding dax devices and eliminating the labels in
   the first 2MB
 - determine timeline for new kernel parameters to specify devdax and
   fsdax directly on the command line itself without ndctl
 - check on status of VFIO selftests that will be useful for automated
   testing of device preservation
 - determine timelines for selftest framework for live updates, which
   could be a significant amount of work
 - update on physical pool allocator that can be used to provide pages
   for hugetlb, guest_memfd, and memfds
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1]
https://docs.google.com/presentation/d/1F-lcl4vSGDX72vhcdmlgKTSe8-GAlwqP46G37SDJP0Q/edit?usp=drive_link&resourcekey=0-jrQSQ7Catn-A7EimsR475A
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=ramdax
[3] https://github.com/groeck/linux-build-test.git
[4] http://kerneltests.org



More information about the kexec mailing list