[PATCH 3/6] vfio: remove the unused mdev iommu hook

Tian, Kevin kevin.tian at intel.com
Wed Jul 21 23:02:15 PDT 2021


A gentle ping...

> From: Tian, Kevin
> Sent: Wednesday, June 30, 2021 5:08 PM
> 
> > From: Joerg Roedel <joro at 8bytes.org>
> > Sent: Monday, May 17, 2021 11:35 PM
> >
> > On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote:
> > > Well, I'm sorry, but there is a huge other thread talking about the
> > > IOASID design in great detail and why this is all needed. Jumping into
> > > this thread without context and basically rejecting all the
> > > conclusions that were reached over the last several weeks is really
> > > not helpful - especially since your objection is not technical.
> > >
> > > I think you should wait for Intel to put together the /dev/ioasid uAPI
> > > proposal and the example use cases it should address then you can give
> > > feedback there, with proper context.
> >
> > Yes, I think the next step is that someone who read the whole thread
> > writes up the conclusions and a rough /dev/ioasid API proposal, also
> > mentioning the use-cases it addresses. Based on that we can discuss the
> > implications this needs to have for IOMMU-API and code.
> >
> > From the use-cases I know the mdev concept is just fine. But if there is
> > a more generic one we can talk about it.
> >
> 
> Although /dev/iommu v2 proposal is still in progress, I think there are
> enough background gathered in v1 to resume this discussion now.
> 
> In a nutshell /dev/iommu requires two sets of services from the iommu
> layer:
> 
> -   for an kernel-managed I/O page table via map/unmap;
> -   for an user-managed I/O page table via bind/invalidate and nested on
>     a kernel-managed parent I/O page table;
> 
> Each I/O page table could be attached by multiple devices. /dev/iommu
> maintains device specific routing information (RID, or RID+PASID) for
> where to install the I/O page table in the IOMMU for each attached device.
> 
> Kernel-managed page table is represented by iommu domain. Existing
> IOMMU-API allows /dev/iommu to attach a RID device to iommu domain.
> A new interface is required, e.g. iommu_attach_device_pasid(domain, dev,
> pasid), to cover (RID+PASID) attaching. Once attaching succeeds, no change
> to following map/unmap which are domain-wide thus applied to both RID
> and RID+PASID. In case of dev_iotlb invalidation is required, the iommu
> driver is responsible for handling it for every attached RID or RID+PASID
> if ats is enabled.
> 
> to take one example, the parent (RID1) has three work queues. WQ1 is
> for parent's own DMA-API usage, with WQ2 (PASID-x) assigned to VM1
> and WQ3 (PASID-y) assigned to VM2. VM2 is also assigned with another
> device (RID2). In this case there are three kernel-managed I/O page
> tables (IOVA in kernel, GPA for VM1 and GPA for VM2), thus RID1 is
> attached to three domains:
> 
> RID1 --- domain1 (default, IOVA)
>      |      |
>      |      |-- [RID1]
>      |
>      |-- domain2 (vm1, GPA)
>      |      |
>      |      |-- [RID1, PASID-x]
>      |
>      |-- domain3 (vm2, GPA)
>      |      |
>      |      |-- [RID1, PASID-y]
>      |      |
>      |      |-- [RID2]
> 
> The iommu layer should maintain above attaching status per device and per
> iommu domain. There is no mdev/subdev concept in the iommu layer. It's
> just about RID or PASID.
> 
> User-manage I/O page table might be represented by a new object which
> describes:
> 
>     - routing information (RID or RID+PASID)
>     - pointer to iommu_domain of the parent I/O page table (inherit the
>       domain ID in iotlb due to nesting)
>     - address of the I/O page table
> 
> There might be chance to share the structure with native SVA which also
> has page table managed outside of iommu subsystem. But we can leave
> it and figure out until coding.
> 
> And a new set of IOMMU-API:
> 
>     - iommu_{un}bind_pgtable(domain, dev, addr);
>     - iommu_{un}bind_pgtable_pasid(domain, dev, addr, pasid);
>     - iommu_cache_invalidate(domain, dev, invalid_info);
>     - and APIs for registering fault handler and completing faults;
> (here 'domain' is the one representing the parent I/O page table)
> 
> Because nesting essentially creates a new reference to the parent I/O
> page table, iommu_bind_pgtable_pasid() implicitly calls __iommu_attach_
> device_pasid() to setup the connection between the parent domain and
> the new [RID,PASID]. It's not necessary to do so for iommu_bind_pgtable()
> since the RID is already attached when the parent I/O page table is created.
> 
> In consequence the example topology is updated as below, with guest
> SVA enabled in both vm1 and vm2:
> 
> RID1 --- domain1 (default, IOVA)
>      |      |
>      |      |-- [RID1]
>      |
>      |-- domain2 (vm1, GPA)
>      |      |
>      |      |-- [RID1, PASID-x]
>      |      |-- [RID1, PASID-a] // nested for vm1 process1
>      |      |-- [RID1, PASID-b] // nested for vm1 process2
>      |
>      |-- domain3 (vm2, GPA)
>      |      |
>      |      |-- [RID1, PASID-y]
>      |      |-- [RID1, PASID-c] // nested for vm2 process1
>      |      |
>      |      |-- [RID2]
>      |      |-- [RID2, PASID-a] // nested for vm2 process2
> 
> Thoughts?
> 
> Thanks
> Kevin



More information about the linux-arm-kernel mailing list