[REGRESSION] Re: imx8 PCI regression since "iommu: Get DT/ACPI parsing into the proper probe path"
Nicolas Cavallari
Nicolas.Cavallari at green-communications.fr
Mon Jan 19 04:53:01 PST 2026
Le 16/01/2026 à 18:24, Robin Murphy a écrit :
> On 2026-01-16 4:52 pm, Nicolas Cavallari wrote:
>> +cc regressions ML
>>
>> Le 13/01/2026 à 10:17, Nicolas Cavallari a écrit :
>>> +cc patch author & reviewers
>>>
>>> On 1/9/26 17:22, Nicolas Cavallari wrote:
>>>> When upgrading from 6.12 to a 6.18 kernel, I noticed that a PCI
>>>> Ethernet adapter (Microchip LAN7430) would hang under load and not
>>>> recover. When that happens, some of its registers indicate it is
>>>> failing to do DMA reads, so cannot reclaim entries on its ring buffer.
>>>>
>>>> I bisected the problem into this commit:
>>>>
>>>> commit bcb81ac6ae3c2ef95b44e7b54c3c9522364a245c
>>>> Author: Robin Murphy <robin.murphy at arm.com>
>>>> Date: Fri Feb 28 15:46:33 2025 +0000
>>>>
>>>> iommu: Get DT/ACPI parsing into the proper probe path
>>>>
>>>> The problem still exists on 6.19-rc1, on pci/next (29a77b4897f1) and on
>>>> iommu/master (360e85353769) trees. Reverting the commit fixes the
>>>> issue.
>>
>> The problem persists on 6.19-rc5
>>
>>>> The system is a Gateworks GW7200, which is a i.MX 8 Mini connected to a
>>>> Pericom
>>>> PI7C9X2G404 4-port switch connected to the LAN7430 chip.
>>>>
>>>> -[0000:00]---00.0-[01-ff]----00.0-[02-05]--+-01.0-[03]----00.0
>>>> +-02.0-[04]--
>>>> \-03.0-[05]----00.0
>>>>
>>>> The problem only occurs when there is at least another PCI device in use
>>>> on the
>>>> switch. It does not happen if the LAN7430 is the only PCI device, or if
>>>> the
>>>> other devices are not actively used. For example i can reproduce it
>>>> with an
>>>> ath9k wireless network adapter when it is up and running, but not when
>>>> it is
>>>> down or its driver is not loaded.
>>>>
>>>> I suspect that other PCI devices have similar issues, but the LAN7430 is
>>>> the
>>>> easiest one to diagnose, as it hangs within seconds with an iperf3 --
>>>> bidir -u
>>>> -b 200M and its register map are public.
>>>>
>>>> I couldn't find an way to dump the PCI address translation mapping from
>>>> userspace.
>>>> I would be happy to provide more information or test patches.
>>
>> I debugged it further, it seems to be mostly a PCI issue since the
>> system does not actually have an IOMMU.
>
> Indeed, I was figuring this had to be another case of a switch with
> wonky ACS - do Mani's patches adjusting ACS enablement make any difference?
>
> https://lore.kernel.org/all/20260102-pci_acs-v3-1-72280b94d288@oss.qualcomm.com/
With this series, ACS is still enabled and eth1 is still failing under load.
> Although in this case I guess the issue is arguably more that we're
> requesting ACS at all, before we know that there's actually an IOMMU
> present to warrant it. Clearly the best option would be to figure out if
> the switch behaviour itself can be fixed somehow, but perhaps something
> like this might help paper over the issue for now (but I'd have to test
> it to make sure it doesn't break IOMMUs again...)
>
> ----->8-----
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 6b989a62def2..837cc0b5ace4 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -141,10 +141,12 @@ int of_iommu_configure(struct device *dev, struct
> device_node *master_np,
> .np = master_np,
> };
>
> - pci_request_acs();
> err = pci_for_each_dma_alias(to_pci_dev(dev),
> of_pci_iommu_init, &info);
> - of_pci_check_device_ats(dev, master_np);
> + if (!err) {
> + pci_request_acs();
> + of_pci_check_device_ats(dev, master_np);
> + }
> } else {
> err = of_iommu_configure_device(master_np, dev, id);
> }
> -----8<-----
With this, ACS is indeed disabled and eth1 no longer fails under load, but I
have also identified an errata on the PCIe switch that causes ACS to fail. I've
sent another message in this thread.
More information about the linux-arm-kernel
mailing list