[PATCH 08/43] KVM: arm64: gic-v5: Introduce guest IST alloc and management

Sascha Bischoff Sascha.Bischoff at arm.com
Fri May 8 05:43:37 PDT 2026


On Wed, 2026-04-29 at 15:29 +0100, Marc Zyngier wrote:
> On Mon, 27 Apr 2026 17:08:46 +0100,
> Sascha Bischoff <Sascha.Bischoff at arm.com> wrote:
> > 
> > GICv5 guests use Interrupt State Tables (ISTs) to track and manage
> > the
> > interrupt state for SPIs and LPIs. These ISTs are provided to the
> > host's IRS via the VMTE.
> > 
> > On a host GICv5 system, SPIs do not require any up-front memory
> > allocation prior to their use, unlike LPIs which require the OS to
> > allocate an IST. For a GICv5 guest, the same holds from the guest's
> > point of view - the SPIs should require no explicit memory
> > allocation
> > by the guest. This means that the hypervisor must provision the
> > memory
> > which it passed to the IRS for managing a guest's SPI state.
> > 
> > In light of the above, the hypervisor allocates the SPI IST prior
> > to
> > running the guest for the first time. As only a small number of
> > SPIs
> > are expected, this is always allocated as a linear IST. The host is
> > responsible for freeing this memory on guest teardown.
> > 
> > For LPIs, the OS needs to provision memory for state tracking. This
> > applies to both hosts and guests, and so the guest will provision
> > some
> > memory for the LPI IST. However, this is not directly used by
> > KVM. Instead, KVM allocates a shadow LPI IST which is passed to the
> > IRS (in the VMTE). Again, on guest teardown, the hypervisor must
> > free
> > this memory again. The LPI IST is allocated as a two level
> > structure,
> > as many more LPIs are expected than SPIs.
> > 
> > Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
> > ---
> >  arch/arm64/kvm/vgic/vgic-v5-tables.c | 531
> > +++++++++++++++++++++++++++
> >  arch/arm64/kvm/vgic/vgic-v5-tables.h |  22 ++
> >  include/linux/irqchip/arm-gic-v5.h   |   3 +
> >  3 files changed, 556 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c
> > b/arch/arm64/kvm/vgic/vgic-v5-tables.c
> > index 502d05d46cccf..de905f37b61a5 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
> > @@ -501,6 +501,25 @@ int vgic_v5_vmte_init(struct kvm *kvm)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * The following set of forward declarations makes the code layout
> > a *little*
> > + * clearer as it lets us keep the IST-related code together.
> > + */
> > +static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
> > +				    unsigned int id_bits,
> > +				    unsigned int istsz);
> > +static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int
> > id_bits,
> > +				unsigned int istsz, unsigned int
> > l2_split);
> > +static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int
> > id_bits,
> > +				 unsigned int istsz, unsigned int
> > l2_split);
> > +static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
> > +					   unsigned int id_bits,
> > +					   unsigned int istsz,
> > +					   unsigned int l2_split);
> > +static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi);
> > +static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi);
> > +static int vgic_v5_spi_ist_free(struct kvm *kvm);
> > +
> >  /*
> >   * Release the VMT Entry, freeing up any allocated data structures
> > before
> >   * zeroing the VMTE.
> > @@ -531,6 +550,18 @@ int vgic_v5_vmte_release(struct kvm *kvm)
> >  	kfree(vmi->vmd_base);
> >  	kfree(vmi->vpet_base);
> >  
> > +	/* If we have an LPI IST, free it */
> > +	if (vmi->h_lpi_ist)
> > +		ret = vgic_v5_lpi_ist_free(kvm);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* If we have an SPI IST, free it */
> > +	if (vmi->h_spi_ist)
> > +		ret = vgic_v5_spi_ist_free(kvm);
> > +	if (ret)
> > +		return ret;
> > +
> >  	xa_erase(&vm_info, vm_id);
> >  	kfree(vmi);
> >  
> > @@ -634,3 +665,503 @@ int vgic_v5_vmte_free_vpe(struct kvm_vcpu
> > *vcpu)
> >  
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Assign an already allocated IST to the VM by populating the
> > fields in the
> > + * corresponding VMTE. We re-use this code for both an SPI IST and
> > LPI IST, even
> > + * if the paths to reach it might be vastly different.
> > + */
> > +int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
> > +			    bool two_level, unsigned int id_bits,
> > +			    unsigned int l2sz, unsigned int istsz,
> > +			    bool spi_ist)
> > +{
> > +	struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct gicv5_cmd_info cmd_info;
> > +	struct vmtl2_entry *vmte;
> > +	unsigned int section;
> > +	u64 tmp;
> > +	int ret;
> > +
> > +	section = spi_ist ? GICV5_VMTEL2_SPI_SECTION :
> > GICV5_VMTEL2_LPI_SECTION;
> 
> Section? What is a section? This needs documentation (11.2.2 in the
> EAC0 version of the spec) so that people can understand you are
> talking about the 64bit word number in the Level-2 VM Table Entry.

Have added documentation for this.

> 
> > +
> > +	if (ist_base & ~GICV5_VMTEL2E_IST_ADDR) {
> > +		kvm_err("IST alignment issue! Address: 0x%llx,
> > Mask 0x%llx\n",
> > +			ist_base, GICV5_VMTEL2E_IST_ADDR);
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Bail if already allocated - something is broken! */
> > +	if (FIELD_GET(GICV5_VMTEL2E_IST_VALID, vmte-
> > >val[section])) {
> > +		vgic_v5_clean_inval(vmte, sizeof(*vmte), true,
> > true);
> 
> Still this odd construct. I'm starting to wonder whether I'm really
> missing something.

No, it was me who lost the plot here. Have re-worked these cases to
remove this construct. Instead, the code cleans and invalidates before
we risk reading anything that might've changed underneath us, rather
than trying to make sure it isn't cached afterwards.

> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	tmp = FIELD_PREP(GICV5_VMTEL2E_IST_L2SZ, l2sz);
> > +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ADDR,
> > +			ist_base >> GICV5_VMTEL2E_IST_ADDR_SHIFT);
> > +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ISTSZ, istsz);
> > +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ID_BITS, id_bits);
> > +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_STRUCTURE, two_level);
> > +
> > +	WRITE_ONCE(vmte->val[section], cpu_to_le64(tmp));
> > +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, false);
> > +
> > +	/* Finally, mark the entry as valid */
> > +	cmd_info.cmd_type = spi_ist ? SPI_VIST_MAKE_VALID :
> > LPI_VIST_MAKE_VALID;
> > +	ret = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu0),
> > &cmd_info);
> > +
> > +	/* Any cached entries we now have are stale! */
> > +	vgic_v5_clean_inval(vmte, sizeof(*vmte), false, true);
> 
> Shouldn't the clean operation happen *before* you call into the IRQ
> stack? It feels dangerous to do so, even if the callback doesn't do
> much.

Yeah, this now does a clean-invalidate before the page is made valid
via the host IRS.

> 
> > +
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Helper to determine the correct l2sz to use based on the
> > combination of
> > + * PAGE_SIZE and whatever hardware supports.
> > + */
> > +static unsigned int vgic_v5_ist_l2sz(void)
> > +{
> > +	switch (PAGE_SIZE) {
> > +	case SZ_64K:
> > +		if (gicv5_host_ist_caps.ist_l2sz & 0x4)
> 
> Please had definitions for IRS_IDR2.IST_L2SZ.

This function by necessity already exists (albeit a little differently)
in the host driver. I've gone and reused that rather than reinventing
it here.

> 
> > +			return GICV5_IRS_IST_CFGR_L2SZ_64K;
> > +		fallthrough;
> > +	case SZ_4K:
> > +		if (gicv5_host_ist_caps.ist_l2sz & 0x1)
> > +			return GICV5_IRS_IST_CFGR_L2SZ_4K;
> > +		fallthrough;
> > +	case SZ_16K:
> > +		if (gicv5_host_ist_caps.ist_l2sz & 0x2)
> > +			return GICV5_IRS_IST_CFGR_L2SZ_16K;
> > +		break;
> > +	}
> > +
> > +	if (gicv5_host_ist_caps.ist_l2sz & 0x1)
> > +		return GICV5_IRS_IST_CFGR_L2SZ_4K;
> > +
> > +	return GICV5_IRS_IST_CFGR_L2SZ_64K;
> > +}
> > +
> > +/* Helper to determine ISTE size based on metadata requirements */
> > +static unsigned int vgic_v5_ist_istsz(unsigned int id_bits)
> > +{
> > +	if (!gicv5_host_ist_caps.istmd)
> > +		return GICV5_IRS_IST_CFGR_ISTSZ_4;
> > +
> > +	if (id_bits >= gicv5_host_ist_caps.istmd_sz)
> > +		return GICV5_IRS_IST_CFGR_ISTSZ_16;
> > +
> > +	return GICV5_IRS_IST_CFGR_ISTSZ_8;
> > +}
> > +
> > +/*
> > + * Allocate a Linear IST - always used for SPIs and potentially
> > LPIs.
> > + *
> > + * The calculation for n has been taken from the GICv5 spec.
> 
> Bonus points if you add a reference to the relevant part of the spec.

Have done.

> 
> > + *
> > + * NOTE: istsz is the FIELD used by GICv5, not the actual size (or
> > log2() of the
> > + * size).
> > + */
> > +static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
> > +				    unsigned int id_bits, unsigned
> > int istsz)
> > +{
> > +	const size_t n = id_bits + 1 + istsz;
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	__le64 *ist;
> > +	u32 l1sz;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (WARN_ON_ONCE(!vmi))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * Allocate the IST. We only have one level, so we just
> > use the L2 ISTE.
> > +	 */
> > +	l1sz = BIT(n + 1);
> > +	ist = kzalloc(l1sz, GFP_KERNEL);
> > +	if (!ist)
> > +		return -ENOMEM;
> > +
> > +	if (spi_ist) {
> > +		vmi->h_spi_ist = ist;
> > +	} else {
> > +		vmi->h_lpi_ist_structure = false;
> > +		vmi->h_lpi_ist = ist;
> > +	}
> > +
> > +	vgic_v5_clean_inval(ist, l1sz, true, true);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Allocate the first level of a two-level IST - LPI, only.
> > + *
> > + * The calculations for n, l1_size have been taken from the GICv5
> > spec.
> > + *
> > + * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the
> > actual sizes (or
> > + * log2() of the sizes).
> > + */
> > +static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int
> > id_bits,
> > +				unsigned int istsz, unsigned int
> > l2sz)
> > +{
> > +	const size_t n =  max(5, id_bits - ((10 - istsz) + (2 *
> > l2sz)) + 3 - 1);
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	const u32 l1_size = BIT(n + 1);
> > +	struct vgic_v5_vm_info *vmi;
> > +	__le64 *ist;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (!vmi)
> > +		return -EINVAL;
> > +
> > +	ist = kzalloc(l1_size, GFP_KERNEL);
> > +	if (!ist)
> > +		return -ENOMEM;
> > +
> > +	vmi->h_lpi_ist_structure = true;
> > +	vmi->h_lpi_ist = ist;
> > +
> > +	vgic_v5_clean_inval(ist, l1_size, true, true);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Allocate ALL of the second level ISTs for a two-level IST -
> > LPI, only.
> > + *
> > + * The calculations for n, l1_entries, l2_size have been taken
> > from the GICv5
> > + * spec.
> > + *
> > + * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the
> > actual sizes (or
> > + * log2() of the sizes).
> > + */
> > +static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int
> > id_bits,
> > +				unsigned int istsz, unsigned int
> > l2sz)
> > +{
> > +	const size_t n =  max(5, id_bits - ((10 - istsz) + (2 *
> > l2sz)) + 3 - 1);
> > +	const int l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
> > +	const size_t l2_size = BIT(11 + (2 * l2sz) + 1);
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	__le64 *l2ist;
> > +	__le64 *l1ist;
> > +	int index;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (WARN_ON_ONCE(!vmi))
> > +		return -EINVAL;
> > +
> > +	l1ist = vmi->h_lpi_ist;
> > +
> > +	/*
> > +	 * Allocate the storage for the pointers to the L2 ISTs
> > (used when
> > +	 * freeing later).
> > +	 */
> > +	vmi->h_lpi_l2_ists = kzalloc_objs(*vmi->h_lpi_l2_ists,
> > l1_entries,
> > +					  GFP_KERNEL);
> > +	if (!vmi->h_lpi_l2_ists)
> > +		return -ENOMEM;
> > +
> > +	/* Allocate the L2 IST for each L1 IST entry */
> > +	for (index = 0; index < l1_entries; ++index) {
> > +		l2ist = kzalloc(l2_size, GFP_KERNEL);
> > +		if (!l2ist) {
> > +			while (--index >= 0)
> > +				kfree(vmi->h_lpi_l2_ists[index]);
> > +
> > +			kfree(vmi->h_lpi_l2_ists);
> > +			vmi->h_lpi_l2_ists = NULL;
> > +
> > +			return -ENOMEM;
> > +		}
> > +
> > +		/*
> > +		 * We are not doing on-demand allocation of the L2
> > ISTs, and are
> > +		 * instead provisioning the whole IST up front.
> > This means that
> > +		 * we are able to mark the L2 ISTs as valid in the
> > L1 ISTEs as
> > +		 * the overall IST is not yet valid.
> > +		 */
> > +		l1ist[index] = cpu_to_le64(
> > +			virt_to_phys(l2ist) &
> > GICV5_ISTL1E_L2_ADDR_MASK) |
> > +			GICV5_ISTL1E_VALID;
> > +
> > +		vmi->h_lpi_l2_ists[index] = l2ist;
> > +
> > +		vgic_v5_clean_inval(l2ist, l2_size, true, true);
> > +	}
> > +
> > +	/* Handle CMOs for the whole L1 IST in one go */
> > +	vgic_v5_clean_inval(l1ist, l1_entries * sizeof(*l1ist),
> > true, false);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Allocate a two-level IST - LPIs, only */
> > +static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
> > unsigned int id_bits,
> > +					   unsigned int istsz,
> > unsigned int l2sz)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	int ret;
> > +
> > +	/*
> > +	 * Allocate the L1 IST first, then all of the L2s.
> > Everything
> > +	 * is preallocated and we do no on-demand IST allocation.
> > This
> > +	 * is to avoid needing to track if and when the guest is
> > doing
> > +	 * on-demand IST allocation.
> > +	 */
> > +	ret = vgic_v5_alloc_l1_ist(kvm, id_bits, istsz, l2sz);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = vgic_v5_alloc_l2_ists(kvm, id_bits, istsz, l2sz);
> > +	if (ret) {
> > +		/* Free the L1 IST again */
> > +		vmi = xa_load(&vm_info, vm_id);
> > +		kfree(vmi->h_lpi_ist);
> > +		vmi->h_lpi_ist = 0;
> > +
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void vgic_v5_free_allocated_lpi_ist(struct vgic_v5_vm_info
> > *vmi,
> > +					   unsigned int id_bits,
> > +					   unsigned int istsz,
> > +					   unsigned int l2sz)
> > +{
> > +	if (!vmi->h_lpi_ist_structure) {
> > +		kfree(vmi->h_lpi_ist);
> > +		vmi->h_lpi_ist = NULL;
> > +		return;
> > +	}
> > +
> > +	if (vmi->h_lpi_l2_ists) {
> > +		const size_t n = max(2, id_bits - ((10 - istsz) +
> > (2 * l2sz)) + 3 - 1);
> > +		const int l1_entries = BIT(n + 1) /
> > GICV5_IRS_ISTL1E_SIZE;
> > +		int index;
> > +
> > +		for (index = 0; index < l1_entries; ++index)
> > +			kfree(vmi->h_lpi_l2_ists[index]);
> > +
> > +		kfree(vmi->h_lpi_l2_ists);
> > +		vmi->h_lpi_l2_ists = NULL;
> > +	}
> > +
> > +	kfree(vmi->h_lpi_ist);
> > +	vmi->h_lpi_ist = NULL;
> > +}
> > +
> > +void vgic_v5_free_allocated_spi_ist(struct kvm *kvm)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (WARN_ON_ONCE(!vmi))
> > +		return;
> > +
> > +	kfree(vmi->h_spi_ist);
> > +	vmi->h_spi_ist = NULL;
> > +}
> > +
> > +/*
> > + * Free a Linear IST. Can only happen once the VM is dead.
> > + */
> > +static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vmtl2_entry *vmte;
> > +	struct vgic_v5_vm_info *vmi;
> > +	int section, ret;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (!vmi)
> > +		return -EINVAL;
> > +
> > +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (spi) {
> > +		section = GICV5_VMTEL2_SPI_SECTION;
> > +		vgic_v5_free_allocated_spi_ist(kvm);
> > +	} else {
> > +		section = GICV5_VMTEL2_LPI_SECTION;
> > +		vgic_v5_free_allocated_lpi_ist(vmi, 0, 0, 0);
> > +	}
> > +
> > +	/* The VM should be dead here, so we can just zero the VMT
> > section */
> > +	WRITE_ONCE(vmte->val[section], 0ULL);
> > +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, true);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Free a Two-Level IST. Can only happen once the VM is dead.
> > + */
> > +static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi)
> > +{
> > +	unsigned int id_bits, istsz, l2sz;
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	__le64 *l1ist, tmp;
> > +	struct vmtl2_entry *vmte;
> > +	int section, l1_entries;
> > +	size_t n;
> > +	int ret;
> > +
> > +	/* We don't create two-level SPI ISTs, so freeing is a bad
> > idea! */
> > +	if (spi)
> > +		return -EINVAL;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (!vmi)
> > +		return -EINVAL;
> > +
> > +	section = GICV5_VMTEL2_LPI_SECTION;
> > +	l1ist = vmi->h_lpi_ist;
> > +
> > +	if (!vmi->h_lpi_ist_structure)
> > +		return -EINVAL;
> > +
> > +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> > +	if (ret)
> > +		return ret;
> > +
> > +	tmp = le64_to_cpu(READ_ONCE(vmte->val[section]));
> > +
> > +	id_bits = FIELD_GET(GICV5_VMTEL2E_IST_ID_BITS, tmp);
> > +	istsz = FIELD_GET(GICV5_VMTEL2E_IST_ISTSZ, tmp);
> > +	l2sz = FIELD_GET(GICV5_VMTEL2E_IST_L2SZ, tmp);
> > +
> > +	/* Calculation for n taken from the GICv5 specification */
> > +	n =  max(2, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 -
> > 1);
> > +	l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
> > +
> > +	vgic_v5_free_allocated_lpi_ist(vmi, id_bits, istsz, l2sz);
> > +
> > +	/* The VM must be dead, so we can just zero the VMT
> > section */
> > +	WRITE_ONCE(vmte->val[section], 0ULL);
> > +
> > +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, true);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Allocate an IST for SPIs.
> > + *
> > + * We don't anticipate a large number of SPIs being allocated.
> > Therefore, we
> > + * always allocate a Linear IST for SPIs. This will need to be
> > revisited should
> > + * that assumption no longer hold.
> > + */
> > +int vgic_v5_spi_ist_allocate(struct kvm *kvm, phys_addr_t
> > *base_addr,
> > +			     unsigned int id_bits, unsigned int
> > istsz)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	int ret;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (WARN_ON_ONCE(!vmi))
> > +		return -EINVAL;
> > +
> > +	ret = vgic_v5_alloc_linear_ist(kvm, true, id_bits, istsz);
> > +	if (ret)
> > +		return ret;
> > +
> > +	*base_addr = virt_to_phys(vmi->h_spi_ist);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Free the IST for SPIs. Should only happen once the VM is dead.
> > + */
> > +static int vgic_v5_spi_ist_free(struct kvm *kvm)
> > +{
> > +	return vgic_v5_linear_ist_free(kvm, true);
> > +}
> > +
> > +/*
> > + * Allocate an IST for LPIs.
> > + *
> > + * Unlike with SPIs, we anticipate that the guest will allocate a
> > relatively
> > + * large number of LPIs. Therefore, while we support doing a
> > linear LPI IST, it
> > + * is expected that LPI ISTs will be two-level.
> > + */
> > +int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +	unsigned int istsz, l2sz;
> > +	phys_addr_t phys_addr;
> > +	bool two_level;
> > +	int ret;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (WARN_ON_ONCE(!vmi))
> > +		return -EINVAL;
> > +
> > +	istsz = vgic_v5_ist_istsz(id_bits);
> > +	l2sz = vgic_v5_ist_l2sz();
> > +
> > +	/*
> > +	 * Determine if we want to create a Linear or a Two-Level
> > IST.
> > +	 *
> > +	 * If we require more than one page for the IST, create a
> > Two-Level IST
> > +	 * (if the host supports it, which is likely).
> > +	 *
> > +	 * Note: GICv5's istsz is not the size of the ISTEs in
> > log2(bytes). It
> > +	 * is 2 less, hence the +2 below.
> > +	 */
> > +	two_level = gicv5_host_ist_caps.ist_levels &&
> > +		id_bits > PAGE_SHIFT - (2 + istsz);
> > +
> > +	if (!two_level)
> > +		ret = vgic_v5_alloc_linear_ist(kvm, false /* LPIs,
> > not SPIs */,
> > +					       id_bits, istsz);
> > +	else
> > +		ret = vgic_v5_alloc_two_level_lpi_ist(kvm,
> > id_bits, istsz,
> > +						      l2sz);
> > +
> > +	if (ret)
> > +		return ret;
> > +
> > +	phys_addr = virt_to_phys(vmi->h_lpi_ist);
> > +	ret = vgic_v5_vmte_assign_ist(kvm, phys_addr, two_level,
> > id_bits, l2sz,
> > +				      istsz, false);
> > +	if (ret)
> > +		vgic_v5_free_allocated_lpi_ist(vmi, id_bits,
> > istsz, l2sz);
> > +
> > +	return ret;
> > +}
> > +
> > +/* Free the LPI IST again */
> > +int vgic_v5_lpi_ist_free(struct kvm *kvm)
> > +{
> > +	u16 vm_id = vgic_v5_vm_id(kvm);
> > +	struct vgic_v5_vm_info *vmi;
> > +
> > +	vmi = xa_load(&vm_info, vm_id);
> > +	if (!vmi)
> > +		return -ENXIO;
> > +
> > +	if (!vmi->h_lpi_ist_structure)
> > +		return vgic_v5_linear_ist_free(kvm, false);
> > +	else
> > +		return vgic_v5_two_level_ist_free(kvm, false);
> > +}
> > diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h
> > b/arch/arm64/kvm/vgic/vgic-v5-tables.h
> > index 5501a44308362..37e220cda1987 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
> > +++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
> > @@ -54,6 +54,13 @@ struct vmtl2_entry {
> >  #define GICV5_VMTEL2E_IST_STRUCTURE	BIT_ULL(58)
> >  #define GICV5_VMTEL2E_IST_ID_BITS	GENMASK_ULL(63, 59)
> >  
> > +/*
> > + * The LPI and SPI configuration is stored in the 2nd and 3rd 64-
> > bit chunks of
> > + * the VMTE (0-based).
> > + */
> > +#define GICV5_VMTEL2_LPI_SECTION	2
> > +#define GICV5_VMTEL2_SPI_SECTION	3
> > +
> >  /* Virtual PE Table Entry */
> >  typedef __le64 vpe_entry;
> >  #define GICV5_VPE_VALID			BIT_ULL(0)
> > @@ -66,6 +73,12 @@ struct vgic_v5_vm_info {
> >  	vpe_entry __iomem	*vpet_base;
> >  	void __iomem		**vped_ptrs;
> >  	u8			vpe_id_bits;
> > +
> > +	/* Tracking for the hyp-owned ISTs */
> > +	bool			h_lpi_ist_structure;
> > +	__le64			*h_lpi_ist;
> > +	__le64			**h_lpi_l2_ists;
> > +	__le64			*h_spi_ist;
> 
> Can you please document what these individual fields represent? I'm
> not sure what hyp-owned means here...

Have added documentation to clarify that. As a brief summary, because
we allocate both the SPI and LPI ISTs in the hypervisor, we keep base
pointers (and pointers to each L2 array) around so we can quickly
iterate over them either as part of making the arrays valid, or as part
of teardown.

> 
> >  };
> >  
> >  struct vgic_v5_vmt {
> > @@ -146,4 +159,13 @@ int vgic_v5_vmte_release(struct kvm *kvm);
> >  int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
> >  int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
> >  
> > +int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
> > +			    bool two_level, unsigned int id_bits,
> > +			    unsigned int l2sz, unsigned int istsz,
> > bool spi_ist);
> > +int vgic_v5_spi_ist_allocate(struct kvm *kvm, phys_addr_t
> > *base_addr,
> > +			     unsigned int id_bits, unsigned int
> > istsz);
> > +void vgic_v5_free_allocated_spi_ist(struct kvm *kvm);
> > +int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits);
> > +int vgic_v5_lpi_ist_free(struct kvm *kvm);
> > +
> >  #endif
> > diff --git a/include/linux/irqchip/arm-gic-v5.h
> > b/include/linux/irqchip/arm-gic-v5.h
> > index 89579ee04f5d1..ccec0a045927c 100644
> > --- a/include/linux/irqchip/arm-gic-v5.h
> > +++ b/include/linux/irqchip/arm-gic-v5.h
> > @@ -450,6 +450,9 @@ enum gicv5_vcpu_info_cmd_type {
> >  	VMT_L2_MAP,		/* Map in a L2 VMT - *may* happen
> > on VM init */
> >  	VMTE_MAKE_VALID,	/* Make the VMTE valid */
> >  	VMTE_MAKE_INVALID,	/* Make the VMTE (et al.) invalid
> > */
> > +	SPI_VIST_MAKE_VALID,	/* No corresponding invalid */
> > +	LPI_VIST_MAKE_VALID,	/* Triggered by a guest */
> > +	LPI_VIST_MAKE_INVALID,	/* Triggered by a guest */
> >  };
> >  
> >  struct gicv5_cmd_info {
> 
> Thanks,
> 
> 	M.
> 



More information about the linux-arm-kernel mailing list