[Hypervisor Live Update] Notes from October 6, 2025
David Rientjes
rientjes at google.com
Sun Oct 12 16:57:39 PDT 2025
Hi everybody,
Here are the notes from the last Hypervisor Live Update call that happened
on Monday, October 6. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Pasha started off with LUO v4 discussion points. As discovered during
iommu preservation series review, we needed dependency tracking for
situations where one fd depends on a resource. There are two options for
this dependency tracking:
- Option 1: when the fd is preserved with a callback, can_preserve(),
which determines if it can be preserved (and all dependencies are
already preserved)
- Option 2: when we go into prepare(), we check the inter dependencies,
which also allows for cross dependencies (A depending on B, B depending
on A)
In Option 1, userspace must declare which dependencies must be grabbed in
which order; this is not a requirement for Option 2. Pratyush suggested
to start we could go with Option 1, including for unpreserve.
Pratyush suggested there may be an Option 3 where LUO preserves a group of
fds, and they get checked as an entire unit (similar to fdbox). Pasha
said that grouping restore is not possible, same as session grouping.
Praveen Kumar suggested that ordering is important; Pasha agreed, but said
that the question was when this happens: during preserve or during
prepare.
Jason suggested that preserve would need to know the dependenices at that
time (Option 1). He said that we wouldn't be able to allow the memfd to
become mutable until all the fds are put back on it; the sequencing would
require the ordering to be establised at the time of preserve.
Pasha suggested that when we go to prepare phase, it is likely fine to
have them in a different order as long as we are 100% sure that the memfd
is going to be serialized. Jason said when the memfd is frozen, we can't
have inconsistencies; the iommu can take the page pin but you can still
ftruncate() the memfd and that will make the memory delay freed by the
iommu.
The conclusion then was that Option 2 was not possible, we need to know
the right sequence before prepare. Prepare would likely need to do its
work in the same order that can_preserve() was called. Pratyush said LUO
v4 already had per-fd freezing so we could force userspace to do it; for
example, when you preserve iommufd you just force userspace to first
preserve and prepare the memfd that is dependent on it. In this case, the
kernel is just checking and enforcing this.
The consensus was to check the depedencies when you do the ioctl and then
fail the ioctl if needed; userspace does all the ordering required.
----->o-----
Pasha asked if we still want userspace to provide tokens, which was
previously needed because we had global fd preservation; now, each session
has a token that starts at zero: whatever was preserved first has token 0,
what was preserved second has token 1, etc. Jason suggested against the
kernel issuing tokens again: the token allowed predecessor and successor
VMMs to have an ABI where they can say this object is this thing with this
token, and they can then pull it out with that token.
Pratyush asked what we want to solve with kernel issued tokens. Pasha
suggested we might be able to use tokens for ordering. Pratyush said if
userspace uses tokens then they can use the same scheme. We decided that
tokens should be removed from ordering.
----->o-----
There was discussion on whether sessions should be removed entirely or
not. Pasha noted that iommu required a subsystem because it requires
cross file descriptor data during boot. Jason suggested not expressing it
as a subsystem with callbacks; the goal is that the first thread that gets
to serialize the iommu synchronously creates the serialization data under
a LUO lock, the next thread gets the serial data under a LUO lock and may
make a little change. We likely don't want to track this as part of a
subsystem abstraction.
Pasha said that with no callback, for iommu, what we'd want is a call into
LUO during boot to ask for the data. Jason suggested this may be correct
but was focused more on the suspend side. He said during probe we'd have
to ask for the early boot data if it exists.
Pasha was going to propose an RFC discussion usptream on the APIs for
this.
----->o-----
Andrey discussed the current status of his KSTATE work. He wanted to
describe what should be preserved without requiring major subsystem
changes. He suggested a description in common code that parses the
description and saves and restores with versioning.
For example, for struct a, the struct kstate_description would include the
min_version_id that we can restore from. It includes a state list and a
list of fields to preserve. The KSTATE data format includes a magic
number, state_id, version_id, instance_id, and then the size of the data
for preservation. The states are then repeated.
This includes fields versioning; when data is added, the version is bumped
for the field. This allows for making compatible changes for the new
kernel. Jason was against the idea of throwing away data; the idea is
that if the old kernel included data then it would be wrong for the new
kernel to then throw it away. Andrey suggested bumping the
min_version_id. Ben Chaney suggested it will be useful for adding a new
field that the old kernel did not support.
Jason said that if data was changed, there would likely need to be a
significant code flow change associated with it; a recent example is the
vmalloc patch series that ended up in being a significant change.
----->o-----
Next meeting will be on Monday, October 20 at 8am PDT (UTC-7), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq
Topics for the next meeting:
- follow up in fd dependency checking and this happening at the time of
preserve rather than prepare (Option 1)
- follow up on not relying on subsystems in LUO and the APIs on both
sides of the live update for getting data needed
- update on latest status of LUO and next steps for merge into akpm's
tree
- update on the status of stateless KHO RFC patches that should simplify
LUO support
- update on memfd preservation, vmalloc support, and 1GB limitation
- discuss guest_memfd preservation use cases for Confidential Computing
and any current work happening on it, including overlap with memfd
preservation being worked on by Pratyush
+ discuss any use cases for Confidential Computing where folios may
need to be split after being marked as preserved during brown out
- [15 min] summarize upstream iommu persistence discussion and surface
any misalignment
- later: testing methodology to allow downstream consumers to qualify
that live update works from one version to another
- later: reducing blackout window during live update
Please let me know if you'd like to propose additional topics for
discussion, thank you!
More information about the kexec
mailing list