[RFC] Some issues about LPI migration on ARM SMP platform

liaochang (A) liaochang1 at huawei.com
Sun Feb 5 18:52:30 PST 2023


Hi, Marc,Catalin

Recently, Yipeng report an issue [1] about LPI migration on SMP and get
some valuable feedback, but after further investigation, I beleive it is
a fundamental issue of LPI handling mechanism which can happen in those
contexts:

For example, NIC device generates MSI and sends LPI to CPU0 via ITS,
meanwhile irqbalance running on CPU1 set irq affinty of NIC to CPU1,
the next interrupt will be sent to CPU1, due to the state of irq is
still in progress, kernel does not end up performing irq handler on
CPU1, which results in some userland service timeout, the sequence
of events is shown as follows:

NIC         CPU0                   CPU1
==========  ===================    ==================
Generate
Interrupt#1
            READ_IAR
            Lock irq_desc
            Set IRQD_IN_PROGRESS
            Unlock irq_desc

                                    Lock irq_desc
                                    Change LPI affinity
                                    Unlock irq_desc

            Call irq_handler

Generate
Interupt#2
                                    READ_IAR
                                    Lock irq_desc
                                    Check IRQD_IN_PROGRESS
                                    Unlock irq_desc
                                    Return from interrupt#2
            Lock irq_desc
            Clear IRQD_IN_PROGRESS
            Unlock irq_desc
            return from interrupt#1

>From the perspective of NIC, if one IRQ acked is not serviced by kernel,
it needs to wait for a long time to generate the next event. This problem is
reproduced on several chips from different Arm vendors using something
like the following commands, please replace the IRQ 56 with the ITS-MSI
irq on your machine if you want to test on your own.

# while true; do echo trigger > /sys/kernel/debug/irq/irqs/56; done
# while true; do echo 0 > /proc/irq/56/smp_affinity_list; echo 1 > /proc/irq/56/smp_affinity_list; done

In [1], Thomas suggests to enable CONFIG_GENERIC_PENDING_IRQ to delay the
affinity change of LPI, so that the interrupt will not be migrated to
the new CPU until all interrupt context exits, it ensures all IRQ acked
will be serviced, and i develop a not-so-perfect patch [2] to delay the
affinity change of LPI until the next interrupt acknowledge, let me use
the sequence below explains what this patch has done:


NIC        CPU0                       CPU1
========== ===================        ==================
Generate
Interrupt#1
           READ_IAR
           Lock irq_desc
           Set IRQD_IN_PROGRESS
           Unlock irq_desc

                                       Lock irq_desc
                                       1) its_set_affinity returns
                                          -EBUSY
                                       2) Call irqd_set_move_pending
                                       Unlock irq_desc

           Call irq_handler

Generate
Interupt#2
           Lock irq_desc
           Clear IRQD_IN_PROGRESS
           Unlock irq_desc
           Return from interrupt#1

           3) Read_IAR and clear the
              corresponding bit in CPU0
              LPI pending table
           4) Call irq_move_irq
           5) Call irq_retrigger to set
              corresponding bit in CPU1
              LPI pending table.
           6) Return from interrupt#2   Read_IAR
                                         Lock irq_desc
                                         Set IRQD_IN_PROGRESS
                                         Unlock irq_desc
                                         Call irq_handler
Generate
Interrupt#3
                                         Lock irq_desc
                                         Clear IRQD_IN_PROGRESS
                                         Unlock irq_desc
                                         Return from interrupt#2


After that, the missing LPI caused userland timeout never occur again so
far, but I don't think this solution can ensure all interrupt generated
by Device could be serviced on CPU1 totally, it depends on the order of
the issuing of device interrupt and the reading of IAR on CPU1, if device
generates interrupt#3 before CPU1 starts to service the interrupt#2 that
is retriggered on CPU0, the interrupt#3 is drop actually because LPI
pending table use single bit to record each LPI, this scenario is shown
as follows:


NIC        CPU0                        CPU1
========== ===================         ==================
Generate
Interupt#2
           1) Read IAR and clear the
              corresponding bit in CPU0
              LPI pending table
           2) Call irq_move_irq
           3) Call irq_retrigger to set
              correspondng bit in CPU1
              LPI pending table
Generate
Interrupt#3
           4) Return from interrupt#2
                                       5) Read_IAR and clear the
                                          corresponding bit in CPU1
                                          LPI pending table
                                       ...
                                       Call irq_handler
                                       ...
                                       Return from interrupt#2


So I wonder why GIC does not support Active state for LPI just like SPI
/PPI/SGI, if so, this problem will be much easier to deal, does ARM has
any plan to support Active state for LPI, or something else to fix this
issue?

Thanks.

[1] https://lore.kernel.org/all/20230106082136.68501-1-zouyipeng@huawei.com/
[2] On CONFIG_GENERIC_PENDING_IRQ=y, the hotfix patch supports delay irq migration for LPI:

--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1682,6 +1682,17 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
        return -EINVAL;
 }

+void handle_misroute_irq(struct irq_desc *desc)
+{
+       struct irq_data *data = irq_desc_get_irq_data(desc);
+
+       if (irq_move_pending(data)) {
+               irq_move_irq(data);
+               its_irq_retrigger(data);
+       } else
+               handle_fasteoi_irq(desc);
+}
+
 static u64 its_irq_get_msi_base(struct its_device *its_dev)
 {
        struct its_node *its = its_dev->its;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 34d58567b78d..a563454ace20 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1466,7 +1466,7 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
                if (!gic_dist_supports_lpis())
                        return -EPERM;
                irq_domain_set_info(d, irq, hw, chip, d->host_data,
-                                   handle_fasteoi_irq, NULL, NULL);
+                                   handle_misroute_irq, NULL, NULL);
                break;

-- 
BR,
Liao, Chang



More information about the linux-arm-kernel mailing list