[PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

Bjorn Helgaas bhelgaas at google.com
Fri Apr 21 11:48:15 EDT 2017


On Mon, Apr 17, 2017 at 12:47 PM, Jayachandran C
<jnair at caviumnetworks.com> wrote:
> On Fri, Apr 14, 2017 at 09:00:06PM -0500, Bjorn Helgaas wrote:
>> On Fri, Apr 14, 2017 at 4:06 PM, Jayachandran C
>> <jnair at caviumnetworks.com> wrote:
>> > On Thu, Apr 13, 2017 at 07:19:11PM -0500, Bjorn Helgaas wrote:
>> >> I tentatively applied both patches to pci/host-thunder for v4.12.
>> >>
>> >> However, I am concerned about the topology here:
>> >>
>> >> On Thu, Apr 13, 2017 at 08:30:45PM +0000, Jayachandran C wrote:
>> >> > On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the
>> >> > PCI topology is slightly unusual.  For a multi-node system, it looks
>> >> > like:
>> >> >
>> >> >     00:00.0 [PCI] bridge to [bus 01-1e]
>> >> >     01:0a.0 [PCI-PCIe bridge, type 8] bridge to [bus 02-04]
>> >> >     02:00.0 [PCIe root port, type 4] bridge to [bus 03-04] (XLATE_ROOT)
>> >> >     03:00.0 PCIe Endpoint
>> >>
>> >> A root port normally has a single PCIe link leading downstream.
>> >> According to this, 02:00.0 is a root port that has the usual
>> >> downstream link leading to 03:00.0, but it also has an upstream link
>> >> to 01:0a.0.
>> >
>> > The PCI topology is a bit broken due to the way that the PCIe IP block
>> > was integrated into SoC PCI bridges and devices. The current mechanism
>> > of adding a PCI-PCIe bridge to glue these together is not ideal.
>>
>> Yeah, that's definitely broken.
>>
>> >> Maybe this example is omitting details that are not relevant to DMA
>> >> aliases?  The PCIe capability only contains one set of link-related
>> >> registers, so I don't know how we could manage a single device that
>> >> has two links.
>> >
>> > The root port is standard and has just one link to the EP (or whatever
>> > is on the external PCIe slot). The fallout of the current hw design is
>> > that the PCI-PCIe bridge has a link that does not follow standard and
>> > does not have a counterpart (as you noted).
>> >
>> >> A device with two links would break things like ASPM.  In
>> >> set_pcie_port_type(), for example, we have this comment:
>> >>
>> >>    * A Root Port or a PCI-to-PCIe bridge is always the upstream end
>> >>    * of a Link.  No PCIe component has two Links.  Two Links are
>> >>    * connected by a Switch that has a Port on each Link and internal
>> >>    * logic to connect the two Ports.
>> >>
>> >> The topology above breaks these assumptions, which will make
>> >> pdev->has_secondary_link incorrect, which means ASPM won't work
>> >> correctly.
>> >
>> > Given the current hardware, the pcieport driver seems to work reasonably
>> > for the root port at 02:00.0, with AER support. I will take a look at the
>> > ASPM part.
>>
>> I don't think pcieport itself cares much about links.  ASPM does, but
>> it looks like set_pcie_port_type() actually is smart enough to know
>> that PCI-to-PCIe bridges and Root Ports always have links on their
>> secondary sides.  So has_secondary_link probably does get set
>> correctly.
>>
>> But I think you'll hit the VIA "strange chipset" thing in
>> pcie_aspm_init_link_state(), which will probably prevent us from doing
>> ASPM on the link from 02:00.0 to 03:00.0.
>>
>> Could you collect "lspci -vv" output from this system?  I'd like to
>> archive that as background for this IOMMU issue and the ASPM tweaks I
>> suspect we'll have to do.  I *wish* we had more information about that
>> VIA thing, because I suspect we could get rid of it if we had more
>> details.
>
> The full logs are slightly large, so I have kept them at:
> https://github.com/jchandra-cavm/thunderx2/blob/master/logs/
> The lspci -vv output is lspci-vv.txt and lspci -tvn output is lspci-tvn.txt
>
> The output is from 2 socket system, the cards are not on the first slot
> like the example above, so the bus and device numbers are different.

Can somebody with this system collect the "lspci -xxxx" output as well?

I'm making some lspci changes to handle the PCI-to-PCIe bridge
correctly, and I can use the "lspci -xxxx" output to create an lspci
test case.



More information about the linux-arm-kernel mailing list