[PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update
Yi Liu
yi.l.liu at intel.com
Tue Mar 24 06:08:16 PDT 2026
On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh at google.com>
>
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
>
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
>
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
>
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
>
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
>
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
>
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
>
> Signed-off-by: Vipin Sharma <vipinsh at google.com>
> Co-developed-by: David Matlack <dmatlack at google.com>
> Signed-off-by: David Matlack <dmatlack at google.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 2 +-
> drivers/vfio/pci/vfio_pci_core.c | 57 +++++----
> drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
> drivers/vfio/pci/vfio_pci_priv.h | 4 +
> drivers/vfio/vfio_main.c | 3 +-
> include/linux/kho/abi/vfio_pci.h | 15 +++
> include/linux/vfio.h | 2 +
> 7 files changed, 213 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
> return 0;
> }
>
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
> .name = "vfio-pci",
> .init = vfio_pci_core_init_dev,
> .release = vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> }
> EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> + struct pci_dev *pdev = vdev->pdev;
> + struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> + lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> + if (!vdev->reset_works)
> + return;
> +
> + /*
> + * Try to get the locks ourselves to prevent a deadlock. The
> + * success of this is dependent on being able to lock the device,
> + * which is not always possible.
> + *
> + * We cannot use the "try" reset interface here, since that will
> + * overwrite the previously restored configuration information.
> + */
> + if (bridge && !pci_dev_trylock(bridge))
> + return;
> +
> + if (!pci_dev_trylock(pdev))
> + goto out;
> +
> + if (!__pci_reset_function_locked(pdev))
> + vdev->needs_reset = false;
> +
> + pci_dev_unlock(pdev);
> +out:
> + if (bridge)
> + pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
> void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> {
> - struct pci_dev *bridge;
> struct pci_dev *pdev = vdev->pdev;
> struct vfio_pci_dummy_resource *dummy_res, *tmp;
> struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> */
> pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>
> - /*
> - * Try to get the locks ourselves to prevent a deadlock. The
> - * success of this is dependent on being able to lock the device,
> - * which is not always possible.
> - * We can not use the "try" reset interface here, which will
> - * overwrite the previously restored configuration information.
> - */
> - if (vdev->reset_works) {
> - bridge = pci_upstream_bridge(pdev);
> - if (bridge && !pci_dev_trylock(bridge))
> - goto out_restore_state;
> - if (pci_dev_trylock(pdev)) {
> - if (!__pci_reset_function_locked(pdev))
> - vdev->needs_reset = false;
> - pci_dev_unlock(pdev);
> - }
> - if (bridge)
> - pci_dev_unlock(bridge);
> - }
> -
> -out_restore_state:
> + vfio_pci_core_try_reset(vdev);
> pci_restore_state(pdev);
> out:
> pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
> * David Matlack <dmatlack at google.com>
> */
>
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + * The support for preserving VFIO PCI devices is currently *partial* and
> + * should be considered *experimental*. It should only be used by developers
> + * working on expanding the support for the time being.
> + *
> + * To avoid accidental usage while the support is still experimental, this
> + * support is hidden behind a default-disable config option
> + * ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + * become complete, this option will be enabled by default when
> + * ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + * device_fd = open("/dev/vfio/devices/X");
/dev/vfio/devices/vfioX
> + * ...
> + * ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + * LUO will hold an extra reference to the device file for as long as it is
> + * preserved, so there is no way for the file to be destroyed or the device
> + * to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + * * The device must be bound to the ``vfio-pci`` driver.
> + *
> + * * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + * the future.
> + *
> + * * The device not be an Intel display device. This may be relaxed in the
> + * future.
> + *
> + * * The device file must have been acquired from the VFIO character device,
> + * not ``VFIO_GROUP_GET_DEVICE_FD``.
how about "The device file descriptor must be obtained by opening the
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via
``VFIO_GROUP_GET_DEVICE_FD``."?
just be aligned with the below words in vfio.rst.
"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device
/dev/vfio/devices/vfioX"
> + *
> + * * The device must have interrupt disable prior to kexec. Failure to disable
> + * interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + * syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + * * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + * kernel guarantees the these will not change across a kexec when a device
> + * is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/kexec_handover.h>
> #include <linux/kho/abi/vfio_pci.h>
> #include <linux/liveupdate.h>
> #include <linux/errno.h>
> +#include <linux/vfio.h>
maybe follow alphabet order. errno.h would be moved to the top first.
Regards,Yi Liu
More information about the kexec
mailing list