[Hypervisor Live Update] Notes from April 21, 2025
David Rientjes
rientjes at google.com
Sun Apr 27 23:20:34 PDT 2025
Hi everybody,
Here are the notes from the last Hypervisor Live Update call that happened
on Monday, April 21. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
KHO v6 is now staged in Andrew's mm-new tree. We discussed what it will
take for this series to be pushed to Linus, specifically around
Reviewed-by tags. There is not a ton of x86 specific code, but it would
likely be useful to have Reviewed-bys from some x86 maintainers. Dave
Hansen, would you be the right person to take a look at this from an x86
perspective?
We definitely wanted to touch base with Jason Gunthorpe on this topic
since he was on vacation at the time. Jason, do you have any feedback on
KHO v6 that would be blocking for its eventual merge upstream?
----->o-----
Pratyush looked at the LUO patches and noticed that it was largely doing
what he would have done with fdbox. He expressed a concern about the
global states, which was not part of the fdbox thinking. Pasha clarified
that the finish step would be when the final cleanup would be done; we
can still control what gets unfreezed and when.
Pratyush asked if the prepare phase worked the same way for LUO. Pasha
said that the list of participating subsystems is added before the
prepare step, which does not freeze anything. Pratyush noted this is
very similar to what he wanted to do with fdbox so suggested working on
top of LUO. Pasha said that LUO v2 was happening right now and already
began porting over the fdbox support.
----->o-----
We discussed whether LUO largely replaces the need for guestmemfs as
discussed in previous instances. It was noted that guestmemfs largely
was aimed at preserving IOMMU and now we have aligned on iommufds, which
was also supported by Jason previously.
Since we didn't have James Gowans in this call this time, we wanted to
follow-up on the upstream mailing list for this. James, do you agree on
the convergence with LUO or is there still use cases where guestmemfs
would be useful that isn't currently planned?
----->o-----
We touched base very briefly about swiotlb in low memory, an issue that
Pratyush ran into several weeks ago. Mike Rapoport noted that this was
now supported with KHO v6 upstream which uses lowmem scratch support, so
this should no longer be an issue.
----->o-----
The u64 in the KHO FDT was discussed from the last sync, which KHO v6
implements. Each component has its own independent FDT that goes to the
global FDT's physical address. We revisited the discussion from before
when Alex had previously implemented it differently. The fixed maximum
size was one of the biggest blockers. Now, with KHO v6, everybody gets
their own subtree and can allocate from anywhere they want.
I noted that one of concerns James had previously flagged was about
dumping the state of the FDT for debugging purposes. This is noted as
being solved in KHO v6. No additional concerns were flagged about the
single u64 that was implemented.
----->o-----
Mike noted that memblock is easy and useful for ftrace without a ton of
additional complexity.
----->o-----
We discussed the future of KSTATE[1]. Andrey noted that his vision
allowed for building it on top of KHO as a protocol for describing state.
This hasn't been done yet, but can be worked on. I asked if the KHO v6
pending in Andrew's tree would cause any issues for this extension.
Andrey noted that this is a format for serialization and de-serialization
data, the data itself could be an FDT blob.
Pasha asked who would be benefiting from this serialization. Andrey
noted this would be pretty much everything, including device drivers. At
least with LUO v2, every component gets the u64 to store anything they
want and this memory gets preserved by KHO. Pasha wondered if KSTATE
could then plug into LUO v2, since it doesn't have any format dependency.
Andrey said this could be done. It has similar flows compared to qemu.
It was noted that KSTATE was going to be complementary, not overlapping,
with KHO. Pasha suggested building KSTATE on top of LUO v2 and suggested
Chris Li review the current KSTATE proposal.
----->o-----
We discussed testing for LUO; Pasha noted that the plan is to add
selftests and then asked a general question about existing kexec tests
that exist, including with qemu emulator. We need to create a mechanism
to do kexec testing with qemu that can be done directly with selftests.
Andrey suggested using a nested VM and to test live update between two
different kernel versions.
I asked about how we could enumerate upstream the supported kernel
versions that support live update, including their device drivers. I
focused on how to describe the set of drivers that have been tested so
that others can consume that information -- is this done through a code
change that indicates that they can upgrade from a specific version.
David Matlack suggested focusing on automation and frameworks that
downstream consumers of the kernel can use in their own environments.
Pasha suggested a zero-day testing infrastructure for reporting
regressions and something like syzbot to track regressions.
David Matlack also noted that he is working on VFIO selftests.
----->o-----
Ashish Kalra asked about SEV-SNP support for live update. He noted "for
SNP there is a VMSA page which is marked in-use/busy when the guest is
running. Then the VMSA page for the currently running vCPU cannot be
dumped by makedumpfile during vmcore generation as walking the guest
memory and touching it will cause unrecoverable #NPF faults as VMSA is
marked busy. So, this looks like a potential use case for preserving
guest memory across kexec (kstate patches), so that the VMSA page be
marked to be preserved and reserved."
----->o-----
Next meeting will be on Monday, May 5 at 8am PDT (UTC-7), everybody is
welcome: https://meet.google.com/rjn-dmzu-hgq
Topics for the next meeting:
- address any pending concerns from Jason as well as x86 maintainers
that start looking at the KHO series
+ we specifically want to wait for Jason to discuss the decision on
everything being preserved by fds vs recreating the state on the
other side of kexec
- determine next set of milestones for KHO v6 beyond what is already
staged in Andrew's tree
- determine next set of milestones for LUO v2 in its development and
limitations on current support
- confirm that LUO v2 subsumes guestmemfs or identify required support
that is not yet present
- update on KSTATE progress on top of LUO and what will be needed for
PCIe devices
- upstream patches for 1GB dev dax support
- update on physical pool allocator that can be used to provide pages
for hugetlb, guest_memfd, and memfds
- SEV-SNP support for preserving guest memory and what foundational
components AMD can depend on, building on top of KHO v6 or KSTATE
- later: reducing blackout window during live update
- later: testing methodology to allow downstream consumers to qualify
that live update works from one version to another
Please let me know if you'd like to propose additional topics for
discussion, thank you!
[1] https://github.com/aryabinin/linux/commits/kstate-v2.1/
More information about the kexec
mailing list