SMMU problem found on LS2085A with 4.6-rc3

Shi, Yang yang.shi at linaro.org
Fri Apr 15 10:55:14 PDT 2016


On 4/15/2016 10:44 AM, Robin Murphy wrote:
> On 15/04/16 18:19, Shi, Yang wrote:
>> On 4/15/2016 5:30 AM, Robin Murphy wrote:
>>> On 15/04/16 00:07, Shi, Yang wrote:
>>>> Hi Robin,
>>>>
>>>> On 4/14/2016 5:04 AM, Robin Murphy wrote:
>>>>> Hi Yang,
>>>>>
>>>>> On 13/04/16 20:31, Shi, Yang wrote:
>>>>>> Hi Will & Robin,
>>>>>>
>>>>>> I just ran some quick test on my LS2085A board, which has 8 Cortex
>>>>>> A57
>>>>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU.
>>>>>>
>>>>>> SMMU driver reports:
>>>>>>
>>>>>> arm_smmu_global_fault: 297974 callbacks suppressed
>>>>>> arm_smmu_global_fault: 298561 callbacks suppressed
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be
>>>>>> serious
>>>>>
>>>>> That's a stream match conflict fault, so you've somehow got two
>>>>> devices
>>>>> using the same stream ID attached to different domains, and at least
>>>>> one
>>>>> of them is trying to do DMA.
>>>>>
>>>>>> But, it is good with 4.5 kernel. I found the below commit causes it:
>>>>>>
>>>>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4
>>>>>> Author: Robin Murphy <robin.murphy at arm.com>
>>>>>> Date:   Tue Jan 26 18:06:36 2016 +0000
>>>>>>
>>>>>>      iommu/arm-smmu: Support DMA-API domains
>>>>>>
>>>>>>      With DMA mapping ops provided by the iommu-dma code, only a
>>>>>> minimal
>>>>>>      contribution from the IOMMU driver is needed to create a
>>>>>> suitable
>>>>>>      DMA-API domain for them to use. Implement this for the ARM
>>>>>> SMMUs.
>>>>>>
>>>>>>      Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>>>>>>      Signed-off-by: Will Deacon <will.deacon at arm.com>
>>>>>>
>>>>>> Any idea?
>>>>>
>>>>> My first guess would be the same thing as [1] - does that patch help?
>>>>
>>>> No, it can't cease the fault.
>>>>
>>>>>
>>>>> Beyond that, what does your DT look like? The one in mainline has one
>>>>> token mmu-masters property which isn't even valid, so nothing ever
>>>>> gets
>>>>
>>>> Mine has mmu-masters property too, but removing it doesn't solve the
>>>> problem.
>>>
>>> OK, now things really stop making sense. Without the mmu-masters
>>> property the SMMU driver will do nothing but probe the SMMU device
>>> itself. Therefore I can only assume the bootloader magic for the
>>> Freescale vendor kernel must be rewriting your DT (our board always just
>>> says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline
>>> DT containing the SMMU, so I'm not sure exactly what it's looking for).
>>> Can you see what it's done via /sys/fimware/fdt (or
>>> /sys/firmware/devicetree/base/ if you can face hunting down phandles
>>> manually)? As a further sanity check, what do you see in
>>> /sys/kernel/iommu_groups/*/devices/ and do they differ between the two
>>> kernels?
>>
>> With the mmu-masters property, the fsl-mc will be added into group 2,
>> please see the below dmesg log:
>>
>> iommu: Adding device 3600000.pcie to group 0
>> iommu: Adding device 3700000.pcie to group 1
>> iommu: Adding device 80c000000.fsl-mc to group 2
>>
>> fsl-mc won't be there if mmu-masters property is removed.
>>
>> But, it looks there are multiple devices in iommu_groups 3:
>>
>> root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0
>> 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/
>>
>> It is group 2 if mmu-masters property is removed.
>>
>> I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and
>> 0000:01:00.1 is for the NIC.
>>
>> Is this behavior expected?
>
> Yup - since the root complex doesn't support ACS, the IOMMU API puts all
> the devices behind it (in this case the NIC and the bridge itself) in
> the same group, because otherwise it might be possible for two devices
> assigned to different guests to DMA directly to each other without going
> through the IOMMU.
>
>>> Secondly, the stream match conflict can only occur if the SMRs are
>>> actually programmed. Since with Will's fix for the conflicts Eric saw we
>>> should attach to the default domain without touching the initial bypass
>>> entries in the SMRs, I'm at a loss to see how you could still get into
>>> this state with that patch applied.
>>
>> I mixed up the patch, with Will's fix applied, the issue is gone away.
>
> Phew, that's a relief! Would you be happy to give a Tested-by on that
> patch?

Sure, just added by Tested-by to that patch. Thanks for your help.

Regards,
Yang

>
> Thanks,
> Robin.
>
>>
>> Thanks,
>> Yang
>>
>>>
>>> As the transactions provoking the fault are apparently instruction
>>> fetches on a 00xx stream ID, which I've not seen before, my first guess
>>> would be it's something to do with the management complex (which I can't
>>> get to work with the staging driver due to firmware incompatibility),
>>> but then looking at the general lack of connection to the DMA API within
>>> that driver, maybe not?
>>>
>>> Robin.
>>>
>>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>
>>>>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A
>>>>> earlier this week - so there's clearly something going on there.
>>>>>
>>>>> More generally, I'd note that the mmu-masters binding will never fully
>>>>> work on this board - you can get the platform devices to cooperate by
>>>>> programming the assorted ICID registers to ensure they present unique
>>>>> stream IDs, but PCI devices cannot work at all because there's no
>>>>> way to
>>>>> make the stream IDs coming out of the root complex be equal to the PCI
>>>>> RID in the way it relies on. In that sense, any regression here is
>>>>> quite
>>>>> likely just a shift from "subtly not working" to "loudly and
>>>>> obnoxiously
>>>>> not working". Conversely, those reasons have also proved it a really
>>>>> useful platform for implementing and testing the iommu-map binding[2]
>>>>> (with an awful hack in the PCI driver to program the lookup table
>>>>> suitably) :D
>>>>>
>>>>> Robin.
>>>>>
>>>>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810
>>>>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Yang
>>>>>>
>>>>>
>>>>
>>>
>>
>




More information about the linux-arm-kernel mailing list