[RFC] Some issues about LPI migration on ARM SMP platform
liaochang (A)
liaochang1 at huawei.com
Sun Feb 5 18:52:30 PST 2023
Hi, Marc,Catalin
Recently, Yipeng report an issue [1] about LPI migration on SMP and get
some valuable feedback, but after further investigation, I beleive it is
a fundamental issue of LPI handling mechanism which can happen in those
contexts:
For example, NIC device generates MSI and sends LPI to CPU0 via ITS,
meanwhile irqbalance running on CPU1 set irq affinty of NIC to CPU1,
the next interrupt will be sent to CPU1, due to the state of irq is
still in progress, kernel does not end up performing irq handler on
CPU1, which results in some userland service timeout, the sequence
of events is shown as follows:
NIC CPU0 CPU1
========== =================== ==================
Generate
Interrupt#1
READ_IAR
Lock irq_desc
Set IRQD_IN_PROGRESS
Unlock irq_desc
Lock irq_desc
Change LPI affinity
Unlock irq_desc
Call irq_handler
Generate
Interupt#2
READ_IAR
Lock irq_desc
Check IRQD_IN_PROGRESS
Unlock irq_desc
Return from interrupt#2
Lock irq_desc
Clear IRQD_IN_PROGRESS
Unlock irq_desc
return from interrupt#1
>From the perspective of NIC, if one IRQ acked is not serviced by kernel,
it needs to wait for a long time to generate the next event. This problem is
reproduced on several chips from different Arm vendors using something
like the following commands, please replace the IRQ 56 with the ITS-MSI
irq on your machine if you want to test on your own.
# while true; do echo trigger > /sys/kernel/debug/irq/irqs/56; done
# while true; do echo 0 > /proc/irq/56/smp_affinity_list; echo 1 > /proc/irq/56/smp_affinity_list; done
In [1], Thomas suggests to enable CONFIG_GENERIC_PENDING_IRQ to delay the
affinity change of LPI, so that the interrupt will not be migrated to
the new CPU until all interrupt context exits, it ensures all IRQ acked
will be serviced, and i develop a not-so-perfect patch [2] to delay the
affinity change of LPI until the next interrupt acknowledge, let me use
the sequence below explains what this patch has done:
NIC CPU0 CPU1
========== =================== ==================
Generate
Interrupt#1
READ_IAR
Lock irq_desc
Set IRQD_IN_PROGRESS
Unlock irq_desc
Lock irq_desc
1) its_set_affinity returns
-EBUSY
2) Call irqd_set_move_pending
Unlock irq_desc
Call irq_handler
Generate
Interupt#2
Lock irq_desc
Clear IRQD_IN_PROGRESS
Unlock irq_desc
Return from interrupt#1
3) Read_IAR and clear the
corresponding bit in CPU0
LPI pending table
4) Call irq_move_irq
5) Call irq_retrigger to set
corresponding bit in CPU1
LPI pending table.
6) Return from interrupt#2 Read_IAR
Lock irq_desc
Set IRQD_IN_PROGRESS
Unlock irq_desc
Call irq_handler
Generate
Interrupt#3
Lock irq_desc
Clear IRQD_IN_PROGRESS
Unlock irq_desc
Return from interrupt#2
After that, the missing LPI caused userland timeout never occur again so
far, but I don't think this solution can ensure all interrupt generated
by Device could be serviced on CPU1 totally, it depends on the order of
the issuing of device interrupt and the reading of IAR on CPU1, if device
generates interrupt#3 before CPU1 starts to service the interrupt#2 that
is retriggered on CPU0, the interrupt#3 is drop actually because LPI
pending table use single bit to record each LPI, this scenario is shown
as follows:
NIC CPU0 CPU1
========== =================== ==================
Generate
Interupt#2
1) Read IAR and clear the
corresponding bit in CPU0
LPI pending table
2) Call irq_move_irq
3) Call irq_retrigger to set
correspondng bit in CPU1
LPI pending table
Generate
Interrupt#3
4) Return from interrupt#2
5) Read_IAR and clear the
corresponding bit in CPU1
LPI pending table
...
Call irq_handler
...
Return from interrupt#2
So I wonder why GIC does not support Active state for LPI just like SPI
/PPI/SGI, if so, this problem will be much easier to deal, does ARM has
any plan to support Active state for LPI, or something else to fix this
issue?
Thanks.
[1] https://lore.kernel.org/all/20230106082136.68501-1-zouyipeng@huawei.com/
[2] On CONFIG_GENERIC_PENDING_IRQ=y, the hotfix patch supports delay irq migration for LPI:
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1682,6 +1682,17 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
return -EINVAL;
}
+void handle_misroute_irq(struct irq_desc *desc)
+{
+ struct irq_data *data = irq_desc_get_irq_data(desc);
+
+ if (irq_move_pending(data)) {
+ irq_move_irq(data);
+ its_irq_retrigger(data);
+ } else
+ handle_fasteoi_irq(desc);
+}
+
static u64 its_irq_get_msi_base(struct its_device *its_dev)
{
struct its_node *its = its_dev->its;
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 34d58567b78d..a563454ace20 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1466,7 +1466,7 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
if (!gic_dist_supports_lpis())
return -EPERM;
irq_domain_set_info(d, irq, hw, chip, d->host_data,
- handle_fasteoi_irq, NULL, NULL);
+ handle_misroute_irq, NULL, NULL);
break;
--
BR,
Liao, Chang
More information about the linux-arm-kernel
mailing list