[PATCH v7 03/15] iommupt: Add the basic structure of the iommu implementation

Jason Gunthorpe jgg at nvidia.com
Mon Oct 27 05:58:45 PDT 2025


On Sat, Oct 25, 2025 at 11:24:25AM -0400, Pasha Tatashin wrote:
> On Thu, Oct 23, 2025 at 2:21 PM Jason Gunthorpe <jgg at nvidia.com> wrote:
> >
> > The existing IOMMU page table implementations duplicate all of the working
> > algorithms for each format. By using the generic page table API a single C
> > version of the IOMMU algorithms can be created and re-used for all of the
> > different formats used in the drivers. The implementation will provide a
> > single C version of the iommu domain operations: iova_to_phys, map, unmap,
> > and read_and_clear_dirty.
> >
> > Further, adding new algorithms and techniques becomes easy to do across
> > the entire fleet of drivers and formats.
> 
> It is an enabler for cross-arch page_table_check for IOMMU. There is
> also a long-standing issue where PT pages are not freed on unmap,
> leading to substantial overhead on some configurations, especially
> where IOVA is cycled through for security purposes (as it was done in
> our environment). Having a single, solid fix for this issue that
> affects all arches is very much desirable.

Yes, I have a simple low cost plan to fix the PMD/etc unfreeing
problem, at least for iommufd.

In iommufd there is an interval tree of IOVA used in the
iommu_domain. When a range of IOVA is removed from the interval tree
it can be normally unmapped. iommufd can then compute the empty span,
this is the end of the prior populated range till the start of the
next populated range and do a cleaning operation on the iommu domain
with that range.

Cleaning will free any table levels that are fully included in the
empty span. cleaning will run under the same 'range-locked' rules as
map/unmap/iova_to_phys.

This cleaning algorithm is already used as part of map, it just needs
to be exposed as an independent op.

Thanks,
Jason



More information about the linux-riscv mailing list