[RFC PATCH v5 00/15] Optimizing iommu_[map/unmap] performance

Lu Baolu baolu.lu at linux.intel.com
Thu Jun 10 20:10:14 PDT 2021


Hi Isaac,

Any update for this series? The iommu core part looks good to me and I
also have some patches for Intel IOMMU implementation of [un]map_pages.
Just wonder when could iommu core have this optimization.

Best regards,
baolu

On 4/9/21 1:13 AM, Isaac J. Manjarres wrote:
> When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
> the buffer at a granule of the largest page size that is supported by
> the IOMMU hardware and fits within the buffer. For every block that
> is unmapped, the IOMMU framework will call into the IOMMU driver, and
> then the io-pgtable framework to walk the page tables to find the entry
> that corresponds to the IOVA, and then unmaps the entry.
> 
> This can be suboptimal in scenarios where a buffer or a piece of a
> buffer can be split into several contiguous page blocks of the same size.
> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
> unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
> and 2 page table walks, to unmap 2 entries that are next to each other in
> the page-tables, when both entries could have been unmapped in one shot
> by clearing both page table entries in the same call.
> 
> The same optimization is applicable to mapping buffers as well, so
> these patches implement a set of callbacks called unmap_pages and
> map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
> an IOVA range that consists of a number of pages of the same
> page size that is supported by the IOMMU hardware, and allows for
> manipulating multiple page table entries in the same set of indirect
> calls. The reason for introducing these callbacks is to give other IOMMU
> drivers/io-pgtable formats time to change to using the new callbacks, so
> that the transition to using this approach can be done piecemeal.
> 
> Changes since V4:
> 
> * Fixed type for addr_merge from phys_addr_t to unsigned long so
>    that GENMASK() can be used.
> * Hooked up arm_v7s_[unmap/map]_pages to the io-pgtable ops.
> * Introduced a macro for calculating the number of page table entries
>    for the ARM LPAE io-pgtable format.
> 
> Changes since V3:
> 
> * Removed usage of ULL variants of bitops from Will's patches, as
>    they were not needed.
> * Instead of unmapping/mapping pgcount pages, unmap_pages() and
>    map_pages() will at most unmap and map pgcount pages, allowing
>    for part of the pages in pgcount to be mapped and unmapped. This
>    was done to simplify the handling in the io-pgtable layer.
> * Extended the existing PTE manipulation methods in io-pgtable-arm
>    to handle multiple entries, per Robin's suggestion, eliminating
>    the need to add functions to clear multiple PTEs.
> * Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable
>    format.
> * arm_[v7s/lpae]_[map/unmap] will call
>    arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page.
> * The arm_smmu_[map/unmap] functions have been removed, since they
>    have been replaced by arm_smmu_[map/unmap]_pages.
> 
> Changes since V2:
> 
> * Added a check in __iommu_map() to check for the existence
>    of either the map or map_pages callback as per Lu's suggestion.
> 
> Changes since V1:
> 
> * Implemented the map_pages() callbacks
> * Integrated Will's patches into this series which
>    address several concerns about how iommu_pgsize() partitioned a
>    buffer (I made a minor change to the patch which changes
>    iommu_pgsize() to use bitmaps by using the ULL variants of
>    the bitops)
> 
> Isaac J. Manjarres (12):
>    iommu/io-pgtable: Introduce unmap_pages() as a page table op
>    iommu: Add an unmap_pages() op for IOMMU drivers
>    iommu/io-pgtable: Introduce map_pages() as a page table op
>    iommu: Add a map_pages() op for IOMMU drivers
>    iommu: Add support for the map_pages() callback
>    iommu/io-pgtable-arm: Prepare PTE methods for handling multiple
>      entries
>    iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
>    iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
>    iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
>    iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
>    iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
>    iommu/arm-smmu: Implement the map_pages() IOMMU driver callback
> 
> Will Deacon (3):
>    iommu: Use bitmap to calculate page size in iommu_pgsize()
>    iommu: Split 'addr_merge' argument to iommu_pgsize() into separate
>      parts
>    iommu: Hook up '->unmap_pages' driver callback
> 
>   drivers/iommu/arm/arm-smmu/arm-smmu.c |  18 +--
>   drivers/iommu/io-pgtable-arm-v7s.c    |  50 ++++++-
>   drivers/iommu/io-pgtable-arm.c        | 189 +++++++++++++++++---------
>   drivers/iommu/iommu.c                 | 130 +++++++++++++-----
>   include/linux/io-pgtable.h            |   8 ++
>   include/linux/iommu.h                 |   9 ++
>   6 files changed, 289 insertions(+), 115 deletions(-)
> 



More information about the linux-arm-kernel mailing list