[PATCH v3 07/24] vfio/pci: Preserve vfio-pci device files across Live Update

Yi Liu yi.l.liu at intel.com
Tue Mar 24 06:08:16 PDT 2026


On 3/24/26 07:57, David Matlack wrote:
> From: Vipin Sharma <vipinsh at google.com>
> 
> Implement the live update file handler callbacks to preserve a vfio-pci
> device across a Live Update. Subsequent commits will enable userspace to
> then retrieve this file after the Live Update.
> 
> Live Update support is scoped only to cdev files (i.e. not
> VFIO_GROUP_GET_DEVICE_FD files).
> 
> State about each device is serialized into a new ABI struct
> vfio_pci_core_device_ser. The contents of this struct are preserved
> across the Live Update to the next kernel using a combination of
> Kexec-Handover (KHO) to preserve the page(s) holding the struct and the
> Live Update Orchestrator (LUO) to preserve the physical address of the
> struct.
> 
> For now the only contents of struct vfio_pci_core_device_ser the
> device's PCI segment number and BDF, so that the device can be uniquely
> identified after the Live Update.
> 
> Require that userspace disables interrupts on the device prior to
> freeze() so that the device does not send any interrupts until new
> interrupt handlers have been set up by the next kernel.
> 
> Reset the device and restore its state in the freeze() callback. This
> ensures the device can be received by the next kernel in a consistent
> state. Eventually this will be dropped and the device can be preserved
> across in a running state, but that requires further work in VFIO and
> the core PCI layer.
> 
> Note that LUO holds a reference to this file when it is preserved. So
> VFIO is guaranteed that vfio_df_device_last_close() will not be called
> on this device no matter what userspace does.
> 
> Signed-off-by: Vipin Sharma <vipinsh at google.com>
> Co-developed-by: David Matlack <dmatlack at google.com>
> Signed-off-by: David Matlack <dmatlack at google.com>
> ---
>   drivers/vfio/pci/vfio_pci.c            |   2 +-
>   drivers/vfio/pci/vfio_pci_core.c       |  57 +++++----
>   drivers/vfio/pci/vfio_pci_liveupdate.c | 156 ++++++++++++++++++++++++-
>   drivers/vfio/pci/vfio_pci_priv.h       |   4 +
>   drivers/vfio/vfio_main.c               |   3 +-
>   include/linux/kho/abi/vfio_pci.h       |  15 +++
>   include/linux/vfio.h                   |   2 +
>   7 files changed, 213 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 41dcbe4ace67..351480d13f6e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -125,7 +125,7 @@ static int vfio_pci_open_device(struct vfio_device *core_vdev)
>   	return 0;
>   }
>   
> -static const struct vfio_device_ops vfio_pci_ops = {
> +const struct vfio_device_ops vfio_pci_ops = {
>   	.name		= "vfio-pci",
>   	.init		= vfio_pci_core_init_dev,
>   	.release	= vfio_pci_core_release_dev,
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..81f941323641 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -585,9 +585,42 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>   }
>   EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
>   
> +void vfio_pci_core_try_reset(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	struct pci_dev *bridge = pci_upstream_bridge(pdev);
> +
> +	lockdep_assert_held(&vdev->vdev.dev_set->lock);
> +
> +	if (!vdev->reset_works)
> +		return;
> +
> +	/*
> +	 * Try to get the locks ourselves to prevent a deadlock. The
> +	 * success of this is dependent on being able to lock the device,
> +	 * which is not always possible.
> +	 *
> +	 * We cannot use the "try" reset interface here, since that will
> +	 * overwrite the previously restored configuration information.
> +	 */
> +	if (bridge && !pci_dev_trylock(bridge))
> +		return;
> +
> +	if (!pci_dev_trylock(pdev))
> +		goto out;
> +
> +	if (!__pci_reset_function_locked(pdev))
> +		vdev->needs_reset = false;
> +
> +	pci_dev_unlock(pdev);
> +out:
> +	if (bridge)
> +		pci_dev_unlock(bridge);
> +}
> +EXPORT_SYMBOL_GPL(vfio_pci_core_try_reset);
> +
>   void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   {
> -	struct pci_dev *bridge;
>   	struct pci_dev *pdev = vdev->pdev;
>   	struct vfio_pci_dummy_resource *dummy_res, *tmp;
>   	struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
> @@ -687,27 +720,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>   	 */
>   	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
>   
> -	/*
> -	 * Try to get the locks ourselves to prevent a deadlock. The
> -	 * success of this is dependent on being able to lock the device,
> -	 * which is not always possible.
> -	 * We can not use the "try" reset interface here, which will
> -	 * overwrite the previously restored configuration information.
> -	 */
> -	if (vdev->reset_works) {
> -		bridge = pci_upstream_bridge(pdev);
> -		if (bridge && !pci_dev_trylock(bridge))
> -			goto out_restore_state;
> -		if (pci_dev_trylock(pdev)) {
> -			if (!__pci_reset_function_locked(pdev))
> -				vdev->needs_reset = false;
> -			pci_dev_unlock(pdev);
> -		}
> -		if (bridge)
> -			pci_dev_unlock(bridge);
> -	}
> -
> -out_restore_state:
> +	vfio_pci_core_try_reset(vdev);
>   	pci_restore_state(pdev);
>   out:
>   	pci_disable_device(pdev);
> diff --git a/drivers/vfio/pci/vfio_pci_liveupdate.c b/drivers/vfio/pci/vfio_pci_liveupdate.c
> index 5ea5af46b159..c4ebc7c486e5 100644
> --- a/drivers/vfio/pci/vfio_pci_liveupdate.c
> +++ b/drivers/vfio/pci/vfio_pci_liveupdate.c
> @@ -6,27 +6,178 @@
>    * David Matlack <dmatlack at google.com>
>    */
>   
> +/**
> + * DOC: VFIO PCI Preservation via LUO
> + *
> + * VFIO PCI devices can be preserved over a kexec using the Live Update
> + * Orchestrator (LUO) file preservation. This allows userspace (such as a VMM)
> + * to transfer an in-use device to the next kernel.
> + *
> + * .. note::
> + *    The support for preserving VFIO PCI devices is currently *partial* and
> + *    should be considered *experimental*. It should only be used by developers
> + *    working on expanding the support for the time being.
> + *
> + *    To avoid accidental usage while the support is still experimental, this
> + *    support is hidden behind a default-disable config option
> + *    ``CONFIG_VFIO_PCI_LIVEUPDATE``. Once the kernel support has stabilized and
> + *    become complete, this option will be enabled by default when
> + *    ``CONFIG_VFIO_PCI`` and ``CONFIG_LIVEUPDATE`` are enabled.
> + *
> + * Usage Example
> + * =============
> + *
> + * VFIO PCI devices can be preserved across a kexec by preserving the file
> + * associated with the device in a LUO session::
> + *
> + *   device_fd = open("/dev/vfio/devices/X");

/dev/vfio/devices/vfioX

> + *   ...
> + *   ioctl(session_fd, LIVEUPDATE_SESSION_PRESERVE_FD, { ..., device_fd, ...});
> + *
> + * .. note::
> + *    LUO will hold an extra reference to the device file for as long as it is
> + *    preserved, so there is no way for the file to be destroyed or the device
> + *    to be unbound from the vfio-pci driver while it is preserved.
> + *
> + * Retrieving the file after kexec is not yet supported.
> + *
> + * Restrictions
> + * ============
> + *
> + * The kernel imposes the following restrictions when preserving VFIO devices:
> + *
> + *  * The device must be bound to the ``vfio-pci`` driver.
> + *
> + *  * ``CONFIG_VFIO_PCI_ZDEV_KVM`` must not be enabled. This may be relaxed in
> + *    the future.
> + *
> + *  * The device not be an Intel display device. This may be relaxed in the
> + *    future.
> + *
> + *  * The device file must have been acquired from the VFIO character device,
> + *    not ``VFIO_GROUP_GET_DEVICE_FD``.

how about "The device file descriptor must be obtained by opening the 
VFIO device
character device (``/dev/vfio/devices/vfioX``), not via 
``VFIO_GROUP_GET_DEVICE_FD``."?

just be aligned with the below words in vfio.rst.

"Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
user can now acquire a device fd by directly opening a character device 
/dev/vfio/devices/vfioX"

> + *
> + *  * The device must have interrupt disable prior to kexec. Failure to disable
> + *    interrupts on the device will cause the ``reboot(LINUX_REBOOT_CMD_KEXEC)``
> + *    syscall (to initiate the kexec) to fail.
> + *
> + * Preservation Behavior
> + * =====================
> + *
> + * The eventual goal of this support is to avoid disrupting the workload, state,
> + * or configuration of each preserved device during a Live Update. This would
> + * include allowing the device to perform DMA to preserved memory buffers and
> + * perform P2P DMA to other preserved devices. However, there are many pieces
> + * that still need to land in the kernel.
> + *
> + * For now, VFIO only preserves the following state for for devices:
> + *
> + *  * The PCI Segment, Bus, Device, and Function numbers of the device. The
> + *    kernel guarantees the these will not change across a kexec when a device
> + *    is preserved.
> + *
> + * Since the kernel is not yet prepared to preserve all parts of the device and
> + * its dependencies (such as DMA mappings), VFIO currently resets and restores
> + * preserved devices back into an idle state during kexec, before handing off
> + * control to the next kernel. This will be relaxed in future versions of the
> + * kernel once it is safe to allow the device to keep running across kexec.
> + */
> +
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> +#include <linux/kexec_handover.h>
>   #include <linux/kho/abi/vfio_pci.h>
>   #include <linux/liveupdate.h>
>   #include <linux/errno.h>
> +#include <linux/vfio.h>

maybe follow alphabet order. errno.h would be moved to the top first.

Regards,Yi Liu



More information about the kexec mailing list