[PATCH v2 0/2] iommu: Allow passing custom allocators to pgtable drivers

Jason Gunthorpe jgg at nvidia.com
Tue Nov 14 08:27:48 PST 2023


On Mon, Nov 13, 2023 at 10:11:03AM +0100, Boris Brezillon wrote:

> > The IOVA allocation would pin down
> > all the radix tree memory so that that any map in the preallocated
> > IOVA range cannot fail.
> 
> Question is, when would you do the IOVA range allocation? So far, I was
> assuming that every BIND request was a combination of:
> 
> 1/ Pre-allocate enough resources for this specific map/unmap to always
> succeed
> 
> 2/ Execute this BIND operation when time comes
> 
> IIUC, you're suggesting doing things differently:
> 
> 1/ Reserve/pre-allocate the IOVA range for your higher-level
> entity/object (through an explicit ioctl, I guess)
> 
> 2/ BIND requests just map/unmap stuff in this pre-allocated/reserved
> IOVA range. All page tables have been allocated during #1, so there's
> no allocation happening here.
> 
> 3/ When your higher level object is destroyed, release the IOVA range,
> which, as a result, unmaps everything in that range, and frees up the
> IOMMU page tables (and any other resources attached to this IOVA range).

I don't really know anything about vulkan so I can't really comment to
well, but it seems to me what you outline makes sense, but also you
could make #1 allocate the IOVA as part of the preallocation??
 
> > > > Now you can be guarenteed that future map in that VA range will be
> > > > fully non-allocating, and future unmap will be fully non-freeing.  
> > > 
> > > You mean fully non-freeing if there are no known remaining users to
> > > come, right?  
> > 
> > unmap of allocated IOVA would be non-freeing. Free would happen on
> > allocate
> 
> Does that mean resources stay around until someone else tries to
> allocate an IOVA range overlapping this previously existing IOVA
> range? With your IOVA range solution, I'd expect resources to be
> released when the IOVA range is released/destroyed.

Sorry, I mistyped deallocate. Yes when the iova is deallocated then
the pinned down radix leaves could become freed. It would be a logical
time to do the freeing.

> > My experience with GPU land is these hacky temporary things become
> > permanent and then a total pain for everyone else :( By the time
> > someone comes to fix it you will be gone and nobody will be willing to
> > help do changes to the GPU driver.
> 
> Hm, that doesn't match my recent experience with DRM drivers, where
> internal DRM APIs get changed pretty regularly, and reviewed by DRM
> driver maintainers in a timely manner...

If the DRM maintainers push it then it happens :) Ask Robin about
his iommu_present() removal

> Anyway, given you already thought it through, can I ask you to provide
> a preliminary implementation for this IOVA range mechanism so I can
> play with it and adjust panthor accordingly. And if you don't have the
> time, can you at least give me extra details about the implementation
> you had in mind, so I don't have to guess and come back with something
> that's not matching what you had in mind.

Oh, I don't know if I can manage patches in any reasonable time frame,
though I think it is pretty straightforward really:

 - Patch to introduce some 'struct iopte_page' (see struct slab)
 - Adjust io pagetable implementations to consume it
 - Do RCU freeing of iopte_page
 - Add a reserve/unreserve io page table ops
 - Implement reserve/unresereve in arm by manipulating a new refcount
   in iopte_page. Rely on RCU to protect the derefs
 - Modify iommufd to call reserve/unreserve around areas attachment
   to have an intree user.

Some of this is a bit interesting, like reserving probably will
ideally want to invoke the batch allocator for efficiency which means
computing the number of radix levels required to fully populate the
current empty level - that should be general code somehow

Jason



More information about the linux-arm-kernel mailing list