[PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
christian.koenig at amd.com
Sat Sep 8 00:29:13 PDT 2018
Am 07.09.2018 um 23:25 schrieb Jacob Pan:
> On Fri, 7 Sep 2018 20:02:54 +0200
> Christian König <christian.koenig at amd.com> wrote:
>>> iommu-sva expects everywhere that the device has an iommu_domain,
>>> it's the first thing we check on entry. Bypassing all of this would
>>> call idr_alloc() directly, and wouldn't have any code in common
>>> with the current iommu-sva. So it seems like you need a layer on
>>> top of iommu-sva calling idr_alloc() when an IOMMU isn't present,
>>> but I don't think it should be in drivers/iommu/
>> In this case I question if the PASID handling should be under
>> drivers/iommu at all.
>> See I can have a mix of VM context which are bound to processes (some
>> few) and VM contexts which are standalone and doesn't care for a
>> process address space. But for each VM context I need a distinct
>> PASID for the hardware to work.
>> I can live if we say if IOMMU is completely disabled we use a simple
>> ida to allocate them, but when IOMMU is enabled I certainly need a
>> way to reserve a PASID without an associated process.
> VT-d would also have such requirement. There is a virtual command
> register for allocate and free PASID for VM use. When that PASID
> allocation request gets propagated to the host IOMMU driver, we need to
> allocate PASID w/o mm.
> If the PASID allocation is done via VFIO, can we have FD to track PASID
> life cycle instead of mm_exit()? i.e. all FDs get closed before
> mm_exit, I assume?
Yes, exactly. I just need a PASID which is never used by the OS for a
process and we can easily give that back when the last FD reference is
>>>> 3. Even after destruction of a process address space we need some
>>>> grace period before a PASID is reused because it can be that the
>>>> specific PASID is still in some hardware queues etc...
>>>> At bare minimum all device drivers using process binding
>>>> need to explicitly note to the core when they are done with a
>>> Right, much of the horribleness in iommu-sva deals with this:
>>> The process dies, iommu-sva is notified and calls the mm_exit()
>>> function passed by the device driver to iommu_sva_device_init(). In
>>> mm_exit() the device driver needs to clear any reference to the
>>> PASID in hardware and in its own structures. When the device driver
>>> returns from mm_exit(), it effectively tells the core that it has
>>> finished using the PASID, and iommu-sva can reuse the PASID for
>>> another process. mm_exit() is allowed to block, so the device
>>> driver has time to clean up and flush the queues.
>>> If the device driver finishes using the PASID before the process
>>> exits, it just calls unbind().
>> Exactly that's what Michal Hocko is probably going to not like at all.
>> Can we have a different approach where each driver is informed by the
>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>> During that teardown transition it would be ideal if that PASID only
>> points to a dummy root page directory with only invalid entries.
> I guess this can be vendor specific, In VT-d I plan to mark PASID
> entry not present and disable fault reporting while draining remaining
Sounds good to me.
Point is at least in the case where the process was killed by the OOM
killer we should not block in mm_exit().
Instead operations issued by the process to a device driver which uses
SVA needs to be terminated as soon as possible to make sure that the OOM
killer can advance.
More information about the linux-arm-kernel