[PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

Bjorn Helgaas helgaas at kernel.org
Wed Apr 12 12:11:38 PDT 2017


On Wed, Apr 12, 2017 at 06:10:34PM +0000, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2017 at 03:27:02PM +0000, Jayachandran C wrote:
> > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > [+cc Joerg]
> > > > 
> > > > On Tue, Apr 11, 2017 at 07:10:48AM +0000, Jayachandran C wrote:
> > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > Hi Jayachandran,
> > > > > > 
> > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +0000, Jayachandran C wrote:
> > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > > > 
> > > > > > > [node level PCI bridges - one per node]
> > > > > > >     [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > >     [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > >         [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > >             [External PCI devices connected to PCIe links]
> > > > > > > 
> > > > > > > The top two levels of bridges should have introduced aliases since they
> > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > > > > > In the case of external PCIe devices, the "real" root ports are connected
> > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > > > > > node level bridges do not introduce an alias either.
> > > > > > > 
> > > > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > > > SoC PCI devices.
> > > > > > > 
> > > > > > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > 
> > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > performance, avoid a crash, etc?
> > > > > 
> > > > > If this is for the commit message, I hope the following is ok:
> > > > > 
> > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > devices, and the IOMMU groups are setup correctly."
> > > > 
> > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > something like "without this change, we set up an IOMMU mapping for
> > > > requestor ID X, but device DMA uses requestor ID Y because ...., which
> > > > results in an IOMMU fault"
> > > 
> > > Ok. I hope this would be better:
> > > 
> > > "Without this change, the last alias seen while traversing the PCI
> > > hierarchy will be used as the RID to generate the device ID for ITS
> > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > device to fail since the ITS expects to have translation tables based
> > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > SMMU translation tables"
> > 
> > This description is true, but I don't think it addresses the real
> > problem.  I think the real problem is that your IOMMU code doesn't
> > handle aliases correctly, and by ignoring these invalid aliases, we
> > happen to map an alias that works for the builtin devices.  But that's
> > only because we got lucky (those devices use a single RID and they're
> > not behind bridges that optionally take ownership).
> > 
> > It would make sense to me if we fixed the IOMMU code to map *all* the
> > aliases, which should be enough to make your devices work.  If we then
> > wanted to apply a patch like this on top, it would be simply an
> > optimization that avoids unnecessary IOMMU mappings.
> 
> The issue that the IOMMU code does not handle valid aliases is
> unrelated to what I am trying to fix. The quirk is to make sure
> that invalid aliases are not seen on ThunderX2 while doing
> pci_for_each_dma_alias().
> 
> The DMA and MSI-X requests leave the PCI/PCIe hierarchy at the point
> where the SMMU (or ITS) is attached, i.e. at the bridge marked with
> PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT. The quirk ensures that we don't look
> for aliases above that point.
> 
> The toplevel bridge is marked PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT since
> the on-chip devices are directly connected to the ITS (they do not
> use SMMU).
> 
> The top two levels of bridges are not real PCI bridges but just
> PCI bridge-like things that were added to tie the whole hierarchy
> together for configuration and enumeration. They do not handle
> PCI/PCIe transactions in the traditional sense.
> 
> I think my problem description is still not correct, maybe:
> "The SMMU (and ITS) expects to device tables to use the RID seen
> at the bridge they are associated with. Currently the
> pci_for_each_dma_alias() code traverses beyond this point and
> generates incorrect aliases due to the PCI and PCI/PCIe bridges
> above. This causes MSI-X interrupts and device DMA to fail since
> the SMMU and ITS tables to be setup with incorrect IDs"

I haven't tried to figure out the MSI-X piece of this, but let me try
to come up with a concrete DMA example.  Assume this topology:

  00:00.0 bridge to [bus 01-1e]
  01:0a.0 bridge to [bus 04-05]
  04:00.0 [14e4:9000 or 9084] bridge to [bus 05] (XLATE_ROOT)
  05:00.0 endpoint

Assume 05:00.0 generates a DMA request.  Assume the top two bridges
are such that pci_for_each_dma_alias() includes them as well, so it
iterates through 05:00.0, 01:0a.0, and 00:00.0.

When the driver for 05:00.0 makes a DMA mapping, the current code
apparently makes an IOMMU mapping for requester ID 00:00.0 because
that's the last alias.  Obviously this doesn't work because the IOMMU
at 04:00.0 will see a requester ID of 05:00.0, not 00:00.0.

With this quirk, we'll omit 01:0a.0 and 00:00.0, so we'll make an
IOMMU mapping for requester ID 05:00.0, which will work fine.  I think
it would *also* work fine if we made IOMMU mappings for 05:00.0,
01:0a.0, and 00:0.0.  The last two are unnecessary, but probably not
harmful.

Now assume 05:00 is a multi-function device that has a DMA alias
quirk, e.g., see quirk_dma_func0_alias().  It has another function:

  05:00.3 endpoint

DMA from 05:00.3 may use a requester ID of either 05:00.3 or 05:00.0.
The driver makes a DMA mapping, pci_for_each_dma_alias() iterates
through 05:00.3, 05:00.0, 01:0a.0, and 00:00.0, and we again map only
00:00.0, which again doesn't work.

With this quirk, we create a single mapping for 05:00.0.  That will
work sometimes, but the device may also generate DMA with a requester
ID of 05:00.3, and that won't work.

If your on-chip device is, e.g., 01:04.0, pci_for_each_dma_alias()
probably iterates through 01:04.0, 00:00.0.  Today we make an IOMMU
mapping for 00:00.0, which doesn't work.  With this quirk, we'll
ignore 00:00.0 and make a mapping for 01:04.0, which does work.  But I
think if you made the IOMMU code add mappings for each of the aliases,
i.e., for both 01:04.0 and 00:00.0, that device *would* work even
without this quirk.

Bjorn



More information about the linux-arm-kernel mailing list