[PATCH 43/43] Documentation: KVM: Add the VGICv5 IRS save/restore sequences

Sascha Bischoff Sascha.Bischoff at arm.com
Fri May 8 10:10:55 PDT 2026


On Thu, 2026-04-30 at 09:57 +0100, Peter Maydell wrote:
> On Mon, 27 Apr 2026 at 17:22, Sascha Bischoff
> <Sascha.Bischoff at arm.com> wrote:
> > 
> > When saving/restoring the state of the GICv5 IRS, it is important
> > that
> > it happens in the correct order. Failure to do so will almost
> > certainly result in failing to restore a guest that is capable of
> > handling interrupts correctly.
> > 
> > On a save, the ISTs must be saved prior to saving the guest's
> > memory
> > as the guest's LPI IST is written to guest memory. Conversely, on
> > restore the guest's memory must be restored prior to restoring the
> > ISTs.
> > 
> > It is important to restore the IRS MMIO registers by first
> > restoring
> > the IRS_IDx registers as they define the capabilities of the IRS,
> > and
> > are used as part of creating and managing ISTs and SPIs.
> > 
> > In order to restore the ISTs themselves, the IRS_IST_CFGR must be
> > restored prior to the IRS_IST_BASER. This is because KVM extracts
> > fields from the CFGR to determine the size and structure of the IRS
> > created by the guest. The IST itself is created as part of the
> > write
> > to the IRS_IST_BASER. At this stage the remaining MMIO registers
> > can
> > be restored.
> > 
> > Once the LPI IST has been created (by the aforementioned write to
> > the
> > IRS_IST_BASER), the IST state can be restored using
> > KVM_DEV_ARM_VGIC_GRP_IST. The SPI IST gets extracted from a
> > userspace
> > provided buffer, and is transferred to the host-allocated SPI IST.
> > The
> > LPI IST is extracted from guest memory, and is written to the
> > host-allocated LPI IST.
> > 
> > As a general rule, the IRS_*_STATUSR registers can be ignored on
> > restore. They are not userspace writable.
> > 
> > Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
> > ---
> >  .../virt/kvm/devices/arm-vgic-v5.rst          | 63
> > +++++++++++++++++++
> >  1 file changed, 63 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> > b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> > index 38eef7cc63e3e..1c55f5040757d 100644
> > --- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> > +++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> > @@ -201,3 +201,66 @@ Groups:
> >        -ENOMEM      Restoring IST state failed while tracking
> > pending interrupts
> >        -ETIMEDOUT   An IRS save/VM operation timed out
> >        =========== 
> > ============================================================
> > +
> > +IRS Save Sequence:
> > +------------------
> > +
> > +The following ordering should be followed when saving the virtual
> > GICv5 and
> > +IRS:
> > +
> > +a) Save the ISTs by issuing KVM_GET_DEVICE_ATTR on
> > KVM_DEV_ARM_VGIC_GRP_IST.
> > +   This MUST happen before the guest's memory is serialised as the
> > LPI IST is
> > +   stored directly to guest memory.
> > +
> > +b) Save the IRS MMIO register state in the following order by
> > issuing
> > +   KVM_GET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
> > +
> > +     1. Save IRS_IDR0-2 and IRS_IDR5-7 registers.
> > +     2. Save IRS_IST_CFGR.
> > +     3. Save IRS_IST_BASER.
> > +     4. Save the remaining global IRS MMIO registers.
> > +     5. For each PE:
> > +        - write IRS_PE_SELR
> > +        - save IRS_PE_CR0
> > +     6. For each SPI:
> > +        - write IRS_SPI_SELR
> > +        - save IRS_SPI_CFGR
> > +
> > +IRS Restore Sequence:
> > +---------------------
> > +
> > +The following ordering must be followed when restoring the virtual
> > GICv5 and
> > +IRS:
> > +
> > +a) restore all guest memory and create vcpus
> > +b) provide the IRS base address by issuing KVM_SET_DEVICE_ATTR on
> > +   KVM_DEV_ARM_VGIC_GRP_ADDR
> > +c) initialise the GIC - this sets up the default state and creates
> > the SPI
> > +   IST - by issuing KVM_SET_DEVICE_ATTR on
> > KVM_DEV_ARM_VGIC_GRP_CTRL with
> > +   KVM_DEV_ARM_VGIC_CTRL_INIT
> 
> This isn't going to work for QEMU, if I understand it correctly.
> QEMU always creates the whole VM first, including creating the
> VCPUs and GIC, telling KVM what its base address is, initializing it,
> etc, before it starts an inbound migration. So the memory read
> is going to come in after step (c), not right at the start.

Hi Peter,

Thanks for the feedback, and excuse my slow response.

So, just to make sure I understand, QEMU does:

a) Create VCPUs
b) Create GIC, supply GIC base address
c) Init GIC
d) Restore guest memory
e) Restore state

If so, I think things can still work broadly as I'd intended as this
part of the ordering can be changed.

One a save, the guest's LPI IST is written to guest memory. On restore,
it needs to be read back from there. Therefore, said memory needs to be
available at the point that one restores the ISTs. The MMIO regs convey
the size of the IST, and hence those need to be restored before the
ISTs themselves.

I think we could do:

a) Create VCPUs
b) Create GIC, supply GIC base address
c) Init GIC
d) Restore guest memory
e) Restore IRS MMIO regs
f) Restore the ISTs by issuing KVM_SET_DEVICE_ATTR on
KVM_DEV_ARM_VGIC_GRP_IST.

> 
> > +d) restore the IRS MMIO register state in the following order by
> > issuing
> > +   KVM_SET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
> > +
> > +     1. Restore IRS_IDR0-2 and IRS_IDR5-7 registers.
> > +     2. Restore IRS_IST_CFGR.
> > +     3. Restore IRS_IST_BASER - this triggers KVM to create the
> > LPI IST.
> > +
> > +e) restore the ISTs by issuing KVM_SET_DEVICE_ATTR on
> > +   KVM_DEV_ARM_VGIC_GRP_IST.
> > +f) restore the remaining IRS MMIO register state in the following
> > order by
> > +   issuing KVM_SET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
> > +
> > +     1. Restore the remaining global IRS MMIO registers.
> > +     2. For each PE:
> > +        - write IRS_PE_SELR
> > +        - restore IRS_PE_CR0
> > +     3. For each SPI:
> > +        - write IRS_SPI_SELR
> > +        - restore IRS_SPI_CFGR
> 
> More generally, if your API involves this much in the way
> of complicated ordering dependencies, it's going to be
> very bug prone.  From userspace's perspective, this is
> not a very helpful way to design the interface :-)

I think it can be simplified to what I have put above.

One of the reasons for the complexity was that I was trying to re-use
the existing LPI IST creation mechanism - on a write to the
IRS_IST_BASER, but that's not necessary. For example, that allocation
could made part of the KVM_DEV_ARM_VGIC_GRP_IST ioctl, removing the
ordering requirement for the IRS_IST_BASER and IRS_IST_CFGR regs
altogether. Moreover, I think the IRS_PE_SELR, IRS_PE_CR0 and the
IRS_SPI_SELR, IRS_SPI_CFGR loops can both be omitted. This would allow
bulk MMIO save/restore for the IRS.

The IRS_PE_* loop was there to handle the case where a guest might've
not opted out of 1-of-N interrupt selection. However, supporting 1-of-N
selection is complex, and is not planned for KVM. As long as we are
happy to say that 1-of-N is not supported, then this can be removed. If
this were added in the future, this part of save/restore would need to
be revisited.

As for the other (SPI) loop, the intent here was to make sure that KVM
correctly tracks the state of level-sensitive and edge-triggered SPIs.
However, that information already exists in the IST that is being
restored, so if needs be, that can be extracted as part of restoring
the IST.

In summary, I think that the restore sequence can be simplified to what
I have put above, which is effectively bulk MMIO restore followed by
IST restore. Please let me know if you think that would work for QEMU.

FWIW, I think the save sequence can be simplified to:

a) Save the ISTs by issuing KVM_GET_DEVICE_ATTR on
KVM_DEV_ARM_VGIC_GRP_IST.
   This MUST happen before the guest's memory is serialised as the
LPI IST is stored directly to guest memory.
b) Save the IRS MMIO register state by
issuing KVM_GET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_IRS_REGS:

Where a) & b) could happen in either order as long as the memory is
saved after the IST has been written to it.

> 
> thanks
> -- PMM

Thanks again for the feedback.
Sascha



More information about the linux-arm-kernel mailing list