[PATCH 3/6] vfio: remove the unused mdev iommu hook

Tian, Kevin kevin.tian at intel.com
Wed Jun 30 02:08:19 PDT 2021


> From: Joerg Roedel <joro at 8bytes.org>
> Sent: Monday, May 17, 2021 11:35 PM
> 
> On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote:
> > Well, I'm sorry, but there is a huge other thread talking about the
> > IOASID design in great detail and why this is all needed. Jumping into
> > this thread without context and basically rejecting all the
> > conclusions that were reached over the last several weeks is really
> > not helpful - especially since your objection is not technical.
> >
> > I think you should wait for Intel to put together the /dev/ioasid uAPI
> > proposal and the example use cases it should address then you can give
> > feedback there, with proper context.
> 
> Yes, I think the next step is that someone who read the whole thread
> writes up the conclusions and a rough /dev/ioasid API proposal, also
> mentioning the use-cases it addresses. Based on that we can discuss the
> implications this needs to have for IOMMU-API and code.
> 
> From the use-cases I know the mdev concept is just fine. But if there is
> a more generic one we can talk about it.
> 

Although /dev/iommu v2 proposal is still in progress, I think there are 
enough background gathered in v1 to resume this discussion now.

In a nutshell /dev/iommu requires two sets of services from the iommu
layer:

-   for an kernel-managed I/O page table via map/unmap;
-   for an user-managed I/O page table via bind/invalidate and nested on 
    a kernel-managed parent I/O page table;

Each I/O page table could be attached by multiple devices. /dev/iommu
maintains device specific routing information (RID, or RID+PASID) for
where to install the I/O page table in the IOMMU for each attached device.

Kernel-managed page table is represented by iommu domain. Existing 
IOMMU-API allows /dev/iommu to attach a RID device to iommu domain. 
A new interface is required, e.g. iommu_attach_device_pasid(domain, dev, 
pasid), to cover (RID+PASID) attaching. Once attaching succeeds, no change
to following map/unmap which are domain-wide thus applied to both RID 
and RID+PASID. In case of dev_iotlb invalidation is required, the iommu 
driver is responsible for handling it for every attached RID or RID+PASID
if ats is enabled.

to take one example, the parent (RID1) has three work queues. WQ1 is 
for parent's own DMA-API usage, with WQ2 (PASID-x) assigned to VM1 
and WQ3 (PASID-y) assigned to VM2. VM2 is also assigned with another 
device (RID2). In this case there are three kernel-managed I/O page 
tables (IOVA in kernel, GPA for VM1 and GPA for VM2), thus RID1 is 
attached to three domains:

RID1 --- domain1 (default, IOVA)
     |      |
     |      |-- [RID1]
     |
     |-- domain2 (vm1, GPA)
     |      |
     |      |-- [RID1, PASID-x]
     |
     |-- domain3 (vm2, GPA)
     |      |
     |      |-- [RID1, PASID-y]
     |      |
     |      |-- [RID2]

The iommu layer should maintain above attaching status per device and per
iommu domain. There is no mdev/subdev concept in the iommu layer. It's
just about RID or PASID.

User-manage I/O page table might be represented by a new object which 
describes:

    - routing information (RID or RID+PASID)
    - pointer to iommu_domain of the parent I/O page table (inherit the
      domain ID in iotlb due to nesting)
    - address of the I/O page table

There might be chance to share the structure with native SVA which also
has page table managed outside of iommu subsystem. But we can leave 
it and figure out until coding.

And a new set of IOMMU-API:

    - iommu_{un}bind_pgtable(domain, dev, addr);
    - iommu_{un}bind_pgtable_pasid(domain, dev, addr, pasid);
    - iommu_cache_invalidate(domain, dev, invalid_info);
    - and APIs for registering fault handler and completing faults;
(here 'domain' is the one representing the parent I/O page table)

Because nesting essentially creates a new reference to the parent I/O 
page table, iommu_bind_pgtable_pasid() implicitly calls __iommu_attach_
device_pasid() to setup the connection between the parent domain and 
the new [RID,PASID]. It's not necessary to do so for iommu_bind_pgtable() 
since the RID is already attached when the parent I/O page table is created.

In consequence the example topology is updated as below, with guest
SVA enabled in both vm1 and vm2:

RID1 --- domain1 (default, IOVA)
     |      |
     |      |-- [RID1]
     |
     |-- domain2 (vm1, GPA)
     |      |
     |      |-- [RID1, PASID-x]
     |      |-- [RID1, PASID-a] // nested for vm1 process1
     |      |-- [RID1, PASID-b] // nested for vm1 process2
     |
     |-- domain3 (vm2, GPA)
     |      |
     |      |-- [RID1, PASID-y]
     |      |-- [RID1, PASID-c] // nested for vm2 process1
     |      |
     |      |-- [RID2]
     |      |-- [RID2, PASID-a] // nested for vm2 process2

Thoughts?

Thanks
Kevin



More information about the linux-arm-kernel mailing list