[Hypervisor Live Update] Notes from October 6, 2025

David Rientjes rientjes at google.com
Sun Oct 12 16:57:39 PDT 2025


Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, October 6.  Thanks to everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Pasha started off with LUO v4 discussion points.  As discovered during 
iommu preservation series review, we needed dependency tracking for 
situations where one fd depends on a resource.  There are two options for 
this dependency tracking:

 - Option 1: when the fd is preserved with a callback, can_preserve(),
   which determines if it can be preserved (and all dependencies are
   already preserved)
 - Option 2: when we go into prepare(), we check the inter dependencies,
   which also allows for cross dependencies (A depending on B, B depending
   on A)

In Option 1, userspace must declare which dependencies must be grabbed in 
which order; this is not a requirement for Option 2.  Pratyush suggested 
to start we could go with Option 1, including for unpreserve.

Pratyush suggested there may be an Option 3 where LUO preserves a group of 
fds, and they get checked as an entire unit (similar to fdbox).  Pasha 
said that grouping restore is not possible, same as session grouping.  
Praveen Kumar suggested that ordering is important; Pasha agreed, but said 
that the question was when this happens: during preserve or during 
prepare.

Jason suggested that preserve would need to know the dependenices at that 
time (Option 1).  He said that we wouldn't be able to allow the memfd to 
become mutable until all the fds are put back on it; the sequencing would 
require the ordering to be establised at the time of preserve.  

Pasha suggested that when we go to prepare phase, it is likely fine to 
have them in a different order as long as we are 100% sure that the memfd 
is going to be serialized.  Jason said when the memfd is frozen, we can't 
have inconsistencies; the iommu can take the page pin but you can still 
ftruncate() the memfd and that will make the memory delay freed by the 
iommu.

The conclusion then was that Option 2 was not possible, we need to know 
the right sequence before prepare.  Prepare would likely need to do its 
work in the same order that can_preserve() was called.  Pratyush said LUO 
v4 already had per-fd freezing so we could force userspace to do it; for 
example, when you preserve iommufd you just force userspace to first 
preserve and prepare the memfd that is dependent on it.  In this case, the 
kernel is just checking and enforcing this.

The consensus was to check the depedencies when you do the ioctl and then 
fail the ioctl if needed; userspace does all the ordering required. 

----->o-----
Pasha asked if we still want userspace to provide tokens, which was 
previously needed because we had global fd preservation; now, each session 
has a token that starts at zero: whatever was preserved first has token 0, 
what was preserved second has token 1, etc.  Jason suggested against the 
kernel issuing tokens again: the token allowed predecessor and successor 
VMMs to have an ABI where they can say this object is this thing with this 
token, and they can then pull it out with that token.

Pratyush asked what we want to solve with kernel issued tokens.  Pasha 
suggested we might be able to use tokens for ordering.  Pratyush said if 
userspace uses tokens then they can use the same scheme.  We decided that 
tokens should be removed from ordering.

----->o-----
There was discussion on whether sessions should be removed entirely or 
not.  Pasha noted that iommu required a subsystem because it requires 
cross file descriptor data during boot.  Jason suggested not expressing it 
as a subsystem with callbacks; the goal is that the first thread that gets 
to serialize the iommu synchronously creates the serialization data under 
a LUO lock, the next thread gets the serial data under a LUO lock and may 
make a little change.  We likely don't want to track this as part of a 
subsystem abstraction.

Pasha said that with no callback, for iommu, what we'd want is a call into 
LUO during boot to ask for the data.  Jason suggested this may be correct 
but was focused more on the suspend side.  He said during probe we'd have 
to ask for the early boot data if it exists.

Pasha was going to propose an RFC discussion usptream on the APIs for 
this.

----->o-----
Andrey discussed the current status of his KSTATE work.  He wanted to 
describe what should be preserved without requiring major subsystem 
changes.  He suggested a description in common code that parses the 
description and saves and restores with versioning.

For example, for struct a, the struct kstate_description would include the 
min_version_id that we can restore from.  It includes a state list and a 
list of fields to preserve.  The KSTATE data format includes a magic 
number, state_id, version_id, instance_id, and then the size of the data 
for preservation.  The states are then repeated.

This includes fields versioning; when data is added, the version is bumped 
for the field.  This allows for making compatible changes for the new 
kernel.  Jason was against the idea of throwing away data; the idea is 
that if the old kernel included data then it would be wrong for the new 
kernel to then throw it away.  Andrey suggested bumping the 
min_version_id.  Ben Chaney suggested it will be useful for adding a new 
field that the old kernel did not support.

Jason said that if data was changed, there would likely need to be a 
significant code flow change associated with it; a recent example is the 
vmalloc patch series that ended up in being a significant change.

----->o-----
Next meeting will be on Monday, October 20 at 8am PDT (UTC-7), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq

Topics for the next meeting:

 - follow up in fd dependency checking and this happening at the time of
   preserve rather than prepare (Option 1)
 - follow up on not relying on subsystems in LUO and the APIs on both
   sides of the live update for getting data needed
 - update on latest status of LUO and next steps for merge into akpm's
   tree
 - update on the status of stateless KHO RFC patches that should simplify
   LUO support
 - update on memfd preservation, vmalloc support, and 1GB limitation
 - discuss guest_memfd preservation use cases for Confidential Computing
   and any current work happening on it, including overlap with memfd
   preservation being worked on by Pratyush
   + discuss any use cases for Confidential Computing where folios may
     need to be split after being marked as preserved during brown out
 - [15 min] summarize upstream iommu persistence discussion and surface
   any misalignment
 - later: testing methodology to allow downstream consumers to qualify
   that live update works from one version to another
 - later: reducing blackout window during live update

Please let me know if you'd like to propose additional topics for
discussion, thank you!



More information about the kexec mailing list