SMMU problem found on LS2085A with 4.6-rc3

Robin Murphy robin.murphy at arm.com
Fri Apr 15 10:44:19 PDT 2016


On 15/04/16 18:19, Shi, Yang wrote:
> On 4/15/2016 5:30 AM, Robin Murphy wrote:
>> On 15/04/16 00:07, Shi, Yang wrote:
>>> Hi Robin,
>>>
>>> On 4/14/2016 5:04 AM, Robin Murphy wrote:
>>>> Hi Yang,
>>>>
>>>> On 13/04/16 20:31, Shi, Yang wrote:
>>>>> Hi Will & Robin,
>>>>>
>>>>> I just ran some quick test on my LS2085A board, which has 8 Cortex A57
>>>>> cores, with 4.6-rc3 kernel, but I found a regression issue with SMMU.
>>>>>
>>>>> SMMU driver reports:
>>>>>
>>>>> arm_smmu_global_fault: 297974 callbacks suppressed
>>>>> arm_smmu_global_fault: 298561 callbacks suppressed
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu:         GFSR 0x80000004, GFSYNR0 0x00000008,
>>>>> GFSYNR1 0x00000300, GFSYNR2 0x00000000
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>> arm-smmu 5000000.iommu: Unexpected global fault, this could be serious
>>>>
>>>> That's a stream match conflict fault, so you've somehow got two devices
>>>> using the same stream ID attached to different domains, and at least
>>>> one
>>>> of them is trying to do DMA.
>>>>
>>>>> But, it is good with 4.5 kernel. I found the below commit causes it:
>>>>>
>>>>> commit 9adb95949a343dac53b1cd81dc973b5f815c88d4
>>>>> Author: Robin Murphy <robin.murphy at arm.com>
>>>>> Date:   Tue Jan 26 18:06:36 2016 +0000
>>>>>
>>>>>      iommu/arm-smmu: Support DMA-API domains
>>>>>
>>>>>      With DMA mapping ops provided by the iommu-dma code, only a
>>>>> minimal
>>>>>      contribution from the IOMMU driver is needed to create a suitable
>>>>>      DMA-API domain for them to use. Implement this for the ARM SMMUs.
>>>>>
>>>>>      Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>>>>>      Signed-off-by: Will Deacon <will.deacon at arm.com>
>>>>>
>>>>> Any idea?
>>>>
>>>> My first guess would be the same thing as [1] - does that patch help?
>>>
>>> No, it can't cease the fault.
>>>
>>>>
>>>> Beyond that, what does your DT look like? The one in mainline has one
>>>> token mmu-masters property which isn't even valid, so nothing ever gets
>>>
>>> Mine has mmu-masters property too, but removing it doesn't solve the
>>> problem.
>>
>> OK, now things really stop making sense. Without the mmu-masters
>> property the SMMU driver will do nothing but probe the SMMU device
>> itself. Therefore I can only assume the bootloader magic for the
>> Freescale vendor kernel must be rewriting your DT (our board always just
>> says "fdt_fixup_smmu: WARNING: no SMMU node found" despite the mainline
>> DT containing the SMMU, so I'm not sure exactly what it's looking for).
>> Can you see what it's done via /sys/fimware/fdt (or
>> /sys/firmware/devicetree/base/ if you can face hunting down phandles
>> manually)? As a further sanity check, what do you see in
>> /sys/kernel/iommu_groups/*/devices/ and do they differ between the two
>> kernels?
>
> With the mmu-masters property, the fsl-mc will be added into group 2,
> please see the below dmesg log:
>
> iommu: Adding device 3600000.pcie to group 0
> iommu: Adding device 3700000.pcie to group 1
> iommu: Adding device 80c000000.fsl-mc to group 2
>
> fsl-mc won't be there if mmu-masters property is removed.
>
> But, it looks there are multiple devices in iommu_groups 3:
>
> root at ls2085a_rdb-4:~# cat /sys/kernel/iommu_groups/3/devices/0000\:0
> 0000:00:00.0/ 0000:01:00.0/ 0000:01:00.1/
>
> It is group 2 if mmu-masters property is removed.
>
> I have one Intel e1000 NIC on my PCIe bus, both 0000:01:00.0 and
> 0000:01:00.1 is for the NIC.
>
> Is this behavior expected?

Yup - since the root complex doesn't support ACS, the IOMMU API puts all 
the devices behind it (in this case the NIC and the bridge itself) in 
the same group, because otherwise it might be possible for two devices 
assigned to different guests to DMA directly to each other without going 
through the IOMMU.

>> Secondly, the stream match conflict can only occur if the SMRs are
>> actually programmed. Since with Will's fix for the conflicts Eric saw we
>> should attach to the default domain without touching the initial bypass
>> entries in the SMRs, I'm at a loss to see how you could still get into
>> this state with that patch applied.
>
> I mixed up the patch, with Will's fix applied, the issue is gone away.

Phew, that's a relief! Would you be happy to give a Tested-by on that patch?

Thanks,
Robin.

>
> Thanks,
> Yang
>
>>
>> As the transactions provoking the fault are apparently instruction
>> fetches on a 00xx stream ID, which I've not seen before, my first guess
>> would be it's something to do with the management complex (which I can't
>> get to work with the staging driver due to firmware incompatibility),
>> but then looking at the general lack of connection to the DMA API within
>> that driver, maybe not?
>>
>> Robin.
>>
>>>
>>> Thanks,
>>> Yang
>>>
>>>
>>>> attached to the SMMU - indeed I've happily booted -rc3 on an LS2085A
>>>> earlier this week - so there's clearly something going on there.
>>>>
>>>> More generally, I'd note that the mmu-masters binding will never fully
>>>> work on this board - you can get the platform devices to cooperate by
>>>> programming the assorted ICID registers to ensure they present unique
>>>> stream IDs, but PCI devices cannot work at all because there's no
>>>> way to
>>>> make the stream IDs coming out of the root complex be equal to the PCI
>>>> RID in the way it relies on. In that sense, any regression here is
>>>> quite
>>>> likely just a shift from "subtly not working" to "loudly and
>>>> obnoxiously
>>>> not working". Conversely, those reasons have also proved it a really
>>>> useful platform for implementing and testing the iommu-map binding[2]
>>>> (with an awful hack in the PCI driver to program the lookup table
>>>> suitably) :D
>>>>
>>>> Robin.
>>>>
>>>> [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/12810
>>>> [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/12454
>>>>
>>>>>
>>>>> Thanks,
>>>>> Yang
>>>>>
>>>>
>>>
>>
>




More information about the linux-arm-kernel mailing list