MSIs not freed in GICv3 ITS driver

Marc Zyngier maz at kernel.org
Mon Jul 8 10:31:57 PDT 2024


Mani,

On Mon, 08 Jul 2024 16:39:33 +0100,
Manivannan Sadhasivam <manivannan.sadhasivam at linaro.org> wrote:
> 
> Hi Marc, Thomas,
> 
> I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from
> PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to
> allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds
> the MSI count to power of 2 to find the order. In this case, the order becomes 2
> in its_alloc_device_irq().

That's because we can only allocate EventIDs as a number of ID
bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or
2^24. This is a power-of-two architecture.

> So 4 entries are allocated by bitmap_find_free_region().

Assuming you're calling about its_alloc_device_irq(), it looks like a
bug. Or rather, some laziness on my part. The thing is, this bitmap is
only dealing with sub-allocation in the pool that has been given to
the endpoint. So the power-of-two crap doesn't really matter unless
you are dealing with Multi-MSI, which has actual alignment
requirements.

>
> But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc()
> will only allocate 3 MSIs, leaving one bitmap entry unused.
> 
> And when the driver frees the MSIs using pci_free_irq_vectors(), only 3
> allocated MSIs were freed and their bitmap entries were also released. But the
> entry for the additional bitmap was never released. Due to this,
> its_free_device() was also never called, resulting in the ITS device not getting
> freed.
> 
> So when the PCIe driver tries to request the MSIs again (PCIe device being
> removed and inserted back), because the ITS device was not freed previously,
> MSIs were again requested for the same ITS device. And due to the stale bitmap
> entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were
> available. This forces the PCIe driver to reduce the MSI count, which is sub
> optimal.
> 
> This behavior might be applicable to other irqchip drivers handling MSI as well.
> I want to know if this behavior is already known with MSI and irqchip drivers?
> 
> For fixing this issue, the PCIe drivers could always request MSIs of power of 2,
> and use a dummy MSI handler for the extra number of MSIs allocated. This could
> also be done in the generic MSI driver itself to avoid changes in the PCIe
> drivers. But I wouldn't say it is the best possible fix.

No, that's terrible. This is just papering over a design mistake, and
I refuse to go down that road.

> 
> Is there any other way to address this issue? Or am I missing something
> completely?

Well, since each endpoint handled by an ITS has its allocation tracked
by a bitmap, it makes more sense to precisely track the allocation.

Here's a quick hack that managed to survive a VM boot. It may even
work. The only problem with it is that it probably breaks a Multi-MSi
device sitting behind a non-transparent bridge that would get its MSIs
allocated after another device. In this case, we wouldn't honor the
required alignment and things would break.

So take this as a proof of concept. If that works, I'll think of how
to deal with this crap in a more suitable way...

	M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 3c755d5dad6e6..43479c9e7f8d2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev)
 
 static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq)
 {
-	int idx;
+	unsigned long idx;
 
 	/* Find a free LPI region in lpi_map and allocate them. */
-	idx = bitmap_find_free_region(dev->event_map.lpi_map,
-				      dev->event_map.nr_lpis,
-				      get_count_order(nvecs));
-	if (idx < 0)
+	idx = bitmap_find_next_zero_area(dev->event_map.lpi_map,
+					 dev->event_map.nr_lpis, 0, nvecs, 0);
+	if (idx >= dev->event_map.nr_lpis)
 		return -ENOSPC;
 
+	bitmap_set(dev->event_map.lpi_map, idx, nvecs);
+
 	*hwirq = dev->event_map.lpi_base + idx;
 
 	return 0;
@@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 	struct its_node *its = its_dev->its;
 	int i;
 
-	bitmap_release_region(its_dev->event_map.lpi_map,
-			      its_get_event_id(irq_domain_get_irq_data(domain, virq)),
-			      get_count_order(nr_irqs));
+	bitmap_clear(its_dev->event_map.lpi_map,
+		     its_get_event_id(irq_domain_get_irq_data(domain, virq)),
+		     nr_irqs);
 
 	for (i = 0; i < nr_irqs; i++) {
 		struct irq_data *data = irq_domain_get_irq_data(domain,

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list