[Hypervisor Live Update] Notes from December 1, 2025
David Rientjes
rientjes at google.com
Sat Dec 13 20:15:07 PST 2025
Hi everybody,
Here are the notes from the last Hypervisor Live Update call that happened
on Monday, December 1. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Pasha updated on the state of stateless KHO :) Jason Miu sent a recent
update which received feedback internally and then will be sent again to
the upstream mailing list.
LUO v8 has been merged into stable and is scheduled for merge into 6.19
now that the merge widnow is open. For LUO, there are a few patches still
outstanding: end-to-end testing (there is one patch that allows for
creating a VM to do automatic testing) that is postponed, a change to
preserve file life cycle bound global objects (no user in original patch
series) that is postponed, and a patch for an internal API to retrieve
struct files and get tokens for dependencies (no user) that is also
postponed.
Whichever user is merged first for preserving file life cycle bound global
objects will upstream the overall support.
----->o-----
Jork asked if user mode tools intergrate LUO into systemd. Pasha said
that luod will be integrated with systemd; the design proposed the way it
would be integrated. luod would be holding the sessions through the
reboot command so that the VMM can exit. Jork asked where the designs
were, Pasha pointed him to the cover letter for LUO.
Pasha noted that the source code for luod would be open source and added
to our GitHub.
----->o-----
David Matlack updated on the status of his VFIO series[1] that was
recently sent out; some feedback was being provided upstream that he will
be iterating through. This series is largely mechanical to add the
plumbing to preserve the VFIO device file descriptor across live update.
Actual PCI device preservation will be built on top of it.
Pasha discussed BDF for PCI device preservation and whether we should use
a path instead, which shouldn't add much overhead. He opined that we may
need to preserve devices that are not PCI, like TPM. Chris Li noted that
the bus number is assigned from the ACPI table so you need to infer that
the root is flat and that the first bus number is the slot, but that's
ugly.
David said that for non-PCI devices that these would likely need their own
solutions. Pasha asked if the PCI device preservation would be extendable
to non-PCI devices if possible; he acknowledged that BDF can't change but
still wondered if there was a more common way to preserve. David
suggested that anything here could be added incrementally on top later.
For PCI assigned buses, Pasha suggested ignoring this parameter or disable
live update if the parameter is set. It opened up the general question of
how we should handle conflicting parameters -- should this be handled by
VFIO or by LUO? Chris suggested bailing out when the buses we care about,
i.e. those involved in the live update, get changed. Pasha suggested we
could allow the auto assignment on first boot and then ignore the
parameter on live update. David suggested that whenever a PCI device
needs to be preserved (a callback into the PCI subsystem), that code can
check if this option is enabled and, if so, fail. The next kernel would
still need to check and perhaps panic if it's set. Pasha compared this to
sanity checking the memmap for the new kernel which would be required for
consistency.
----->o-----
Samiullah updated on the status of the IOMMU preservation series that was
going to be sent before LPC which was building on top of the VFIO series
that David sent out above. There will be no autobind in this one and it
is using internal tokens; the LUO get token API will be integrated later.
This is currently planned as an RFC.
Pasha asked about the lock-unlock functionality and whether Sami thought
that this was lagging -- Sami said that it was actually better now because
there's better flexibility. David said that the only locking for his
series was synchronizing finish when the FLB is freed with anything that's
using it. Anything using it already takes the mutex. If you try your own
locking, then this ends up in a deadlock that is mentioned in the cover
letter. Sami did not encounter this issue.
----->o-----
Pratyush did not have updates for the HugeTLB + 1GB page preservation
support but was hopeful there would be an update in the next week.
He also updated that end of this week or next week he was hoping to have
an RFC to share with an early implementation for versioning support to
discuss at LPC.
----->o-----
Jork discussed measuring KHO recently for internal use cases and found
that traversing the preserved lists and inserting them into memblock took
the majority of the blackout time. He asked about the later agenda item
for deferred struct page initialization. Pasha said that was actually
unrelated to the blackout window, it's rather an incompatibility with KHO.
He acknowledged that KHO is very slow at inserting the memblocks, we need
to address this scalability problem. Pratyush had some ideas how to
handle this, but none of them are easy.
----->o-----
There was an update from Ackerley on guest_memfd support for 1GB HugeTLB:
he was working on qualifying an internal version and was hoping to get
reviews on extending xarrays to support splitting to multiple levels[2]
which was a prerequisite for the series. This was on track to post by
early next year.
----->o-----
The December 15 instance of the meeting is canceled due to LPC travel.
Next meeting will be on Monday, December 29 at 8am PST (UTC-8), everybody
is welcome: https://meet.google.com/rjn-dmzu-hgq
Topics for the next meeting:
- update on the status of stateless KHO patches from Jason Miu
- update on the the status of LUO v8 for Linux 6.19, any patches that are
still pending after upstream merge
- discussion on design and status of luod as well as its integration with
systemd
- update for the VFIO patch series to preserve the VFIO device file
descriptor across live update
- timelines for PCI device preservation on top of VFIO patch series
- next steps for iommu persistence to build upon the VFIO patch series
once that is merged
- status update for HugeTLB + 1GB page preservation support that was sent
out in preparation for LPC
- continued discussion on versioning support for various components for
luod to negotiate
- later: update on status of guest_memfd support for 1GB HugeTLB pages
- later: testing methodology to allow downstream consumers to qualify
that live update works from one version to another
- later: reducing blackout window during live update, including deferred
struct page initialization
Please let me know if you'd like to propose additional topics for
discussion, thank you!
[1]
https://lore.kernel.org/kvm/20251126193608.2678510-1-dmatlack@google.com/
[2]
https://lore.kernel.org/all/20251117224701.1279139-1-ackerleytng@google.com/
More information about the kexec
mailing list