[PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches

Tue May 8 15:21:15 PDT 2018

On 05/08/2018 06:03 PM, Alex Williamson wrote:
> On Tue, 8 May 2018 21:42:27 +0000
> "Stephen  Bates" <sbates at raithlin.com> wrote:
> 
>> Hi Alex
>>
>>>     But it would be a much easier proposal to disable ACS when the
>>> IOMMU is not enabled, ACS has no real purpose in that case.
>>
>> I guess one issue I have with this is that it disables IOMMU groups
>> for all Root Ports and not just the one(s) we wish to do p2pdma on.
> 
> But as I understand this series, we're not really targeting specific
> sets of devices either.  It's more of a shotgun approach that we
> disable ACS on downstream switch ports and hope that we get the right
> set of devices, but with the indecisiveness that we might later
> white-list select root ports to further increase the blast radius.
> 
>>>     The IOMMU and P2P are already not exclusive, we can bounce off
>>> the IOMMU or make use of ATS as we've previously discussed.  We were
>>>     previously talking about a build time config option that you
>>> didn't expect distros to use, so I don't think intervention for the
>>> user to disable the IOMMU if it's enabled by default is a serious
>>> concern either.
>>
>> ATS definitely makes things more interesting for the cases where the
>> EPs support it. However I don't really have a handle on how common
>> ATS support is going to be in the kinds of devices we have been
>> focused on (NVMe SSDs and RDMA NICs mostly).
>>
>>> What you're trying to do is enabled direct peer-to-peer for
>>> endpoints which do not support ATS when the IOMMU is enabled, which
>>> is not something that necessarily makes sense to me.
>>
>> As above the advantage of leaving the IOMMU on is that it allows for
>> both p2pdma PCI domains and IOMMU groupings PCI domains in the same
>> system. It is just that these domains will be separate to each other.
> 
> That argument makes sense if we had the ability to select specific sets
> of devices, but that's not the case here, right?  With the shotgun
> approach, we're clearly favoring one at the expense of the other and
> it's not clear why we don't simple force the needle all the way in that
> direction such that the results are at least predictable.
> 
>>>   So that leaves avoiding bounce buffers as the remaining IOMMU
>>> feature
>>
>> I agree with you here that the devices we will want to use for p2p
>> will probably not require a bounce buffer and will support 64 bit DMA
>> addressing.
>>
>>> I'm still not seeing why it's terribly undesirable to require
>>> devices to support ATS if they want to do direct P2P with an IOMMU
>>> enabled.
>>
>> I think the one reason is for the use-case above. Allowing IOMMU
>> groupings on one domain and p2pdma on another domain....
> 
> If IOMMU grouping implies device assignment (because nobody else uses
> it to the same extent as device assignment) then the build-time option
> falls to pieces, we need a single kernel that can do both.  I think we
> need to get more clever about allowing the user to specify exactly at
> which points in the topology they want to disable isolation.  Thanks,
> 
> Alex

+1/ack

RDMA VFs lend themselves to NVMEoF w/device-assignment.... need a way to
put NVME 'resources' into an assignable/manageable object for 'IOMMU-grouping',
which is really a 'DMA security domain' and less an 'IOMMU grouping domain'.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>