[PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

Robin Murphy robin.murphy at arm.com
Tue Apr 4 07:28:26 PDT 2017


On 04/04/17 12:50, Jayachandran C wrote:
> On Mon, Apr 03, 2017 at 04:07:53PM +0100, Robin Murphy wrote:
>> On 03/04/17 14:15, Jayachandran C wrote:
>>> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
>>> topology is slightly unusual. For a multi-node system, it looks like:
>>>
>>> [node level PCI bridges - one per node]
>>>     [SoC PCI devices with MSI-X but no IOMMU]
>>>     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
>>>         [PCIe real root ports associated with IOMMU and GICv3 ITS]
>>>             [External PCI devices connected to PCIe links]
>>
>> Since it's not entirely obvious, what does the actual DT - or IORT if
>> you must ;) - topology for this look like? I can't help thinking that
>> either it's inaccurate, or that this is going to expose a shortcoming in
>> pci_dma_configure() which breaks things - unless I'm missing something,
>> isn't find_pci_root_bus() going to go all the way up to the top-level
>> glue bridge and pick up the wrong firmware node (if any) for the
>> appropriate DMA properties?
> 
> I will try to describe the ACPI interface:
> 
> There is just one ECAM area, a single bus range and one set of memory
> windows for the whole system - so there is just one entry in DSDT for
> the PCI controller. This entry also corresponds to the PCI RC node in
> IORT. DMA is coherent and supports 64 bits system-wide, the attributes
> (in DSDT and IORT) reflect this.
> 
> lspci on the system looks like this:
> -[0000:00]-+-00.0-[01-1e]--+-04.0  14e4:9026
>            |               +-04.1  14e4:9026
>            |               +-05.0  14e4:9027
>            |               +-05.1  14e4:9027
>            |               +-0a.0-[02-03]----00.0-[03]--
>            |               +-0a.1-[04-05]----00.0-[05]--
>            |           [...etc...]
>            |               +-0b.0-[12-14]----00.0-[13-14]--+-00.0  8086:1583
>            |               |                               \-00.1  8086:1583
>            |           [...etc...]
>            |               \-0b.5-[1d-1e]----00.0-[1e]--
>            \-00.1-[1f-3b]--+-04.0  14e4:9026
>                            +-04.1  14e4:9026
>                            +-05.0  14e4:9027
>                            +-05.1  14e4:9027
>                            +-0a.0-[20-21]----00.0-[21]--
>                        [...etc...]
> 
> The devices here are:
>  - 00:00.0 and 00:00.1 are the node (socket) level bridges
>  - 01:[45].x and 1f:[45].x are SoC PCI devices like SATA and USB
>  - 01:[ab].x and 1f:[ab].x are the PCI-PCIe "reverse"/glue bridges
>  - 02:00.0 etc are the "real" PCIe ports connected to external PCIe cards. 
> Each node has a GIC ITS, and a group of 4 PCIe ports have an SMMU.
> 
> The IORT is built by the firmware based on its PCI enumeration. The IORT
> will have multiple entries under the PCI RC node:
>  - one entry per node to map the SoC devices directly to ITS for MSI-X,
>    since the SoC devices are not attached to any SMMU.
>  - An entry per "real" PCIe port to map RIDs under it to the corresponding
>    SMMU.
> The SMMU nodes will have an entry to map its RID ranges to the node ITS.
> 
> The IORT spec supports this configuration, and the corresponding code is
> already upstream, so the only sticking point right now is
> pci_for_each_dma_alias().

Thanks, that helps a lot. The "single global ECAM space" idea was
eluding me, but in that context it all makes much more sense - I'm
assuming the two quirked device IDs correspond to the 00:00.[01] devices
and the [02-1e]:00.0 ones.

So (at the risk of Jon mooing at me), I guess the DT description would
be a single node looking something like:

pcie {
	reg = [global ECAM space for segment 0000];

	msi-map = <0x0100 &its0 0x0100 0x1d00>,
		  <0x1f00 &its1 0x1f00 0x1d00>;
	iommu-map = <0x0200 &smmu0 0x0200 0x1c00>,
		    <0x2000 &smmu0 0x2000 0x1c00>;
};

(note to self: which incidentally also means of_pci_map_rid() probably
wants fixing to not treat gaps in the map as an error)

With only one node like that, rather than having the whole first 3
levels of bridges described, the "stop at the appropriate node in the
callback" approach does become even more impractical in all cases. So,
for $TITLE, based on the above understanding:

Reviewed-by: Robin Murphy <robin.murphy at arm.com>

Cheers,
Robin.

> 
> JC.
> 




More information about the linux-arm-kernel mailing list