[RFC PATCH v5 00/15] Optimizing iommu_[map/unmap] performance
Lu Baolu
baolu.lu at linux.intel.com
Thu Jun 10 20:10:14 PDT 2021
Hi Isaac,
Any update for this series? The iommu core part looks good to me and I
also have some patches for Intel IOMMU implementation of [un]map_pages.
Just wonder when could iommu core have this optimization.
Best regards,
baolu
On 4/9/21 1:13 AM, Isaac J. Manjarres wrote:
> When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
> the buffer at a granule of the largest page size that is supported by
> the IOMMU hardware and fits within the buffer. For every block that
> is unmapped, the IOMMU framework will call into the IOMMU driver, and
> then the io-pgtable framework to walk the page tables to find the entry
> that corresponds to the IOVA, and then unmaps the entry.
>
> This can be suboptimal in scenarios where a buffer or a piece of a
> buffer can be split into several contiguous page blocks of the same size.
> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
> unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
> and 2 page table walks, to unmap 2 entries that are next to each other in
> the page-tables, when both entries could have been unmapped in one shot
> by clearing both page table entries in the same call.
>
> The same optimization is applicable to mapping buffers as well, so
> these patches implement a set of callbacks called unmap_pages and
> map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
> an IOVA range that consists of a number of pages of the same
> page size that is supported by the IOMMU hardware, and allows for
> manipulating multiple page table entries in the same set of indirect
> calls. The reason for introducing these callbacks is to give other IOMMU
> drivers/io-pgtable formats time to change to using the new callbacks, so
> that the transition to using this approach can be done piecemeal.
>
> Changes since V4:
>
> * Fixed type for addr_merge from phys_addr_t to unsigned long so
> that GENMASK() can be used.
> * Hooked up arm_v7s_[unmap/map]_pages to the io-pgtable ops.
> * Introduced a macro for calculating the number of page table entries
> for the ARM LPAE io-pgtable format.
>
> Changes since V3:
>
> * Removed usage of ULL variants of bitops from Will's patches, as
> they were not needed.
> * Instead of unmapping/mapping pgcount pages, unmap_pages() and
> map_pages() will at most unmap and map pgcount pages, allowing
> for part of the pages in pgcount to be mapped and unmapped. This
> was done to simplify the handling in the io-pgtable layer.
> * Extended the existing PTE manipulation methods in io-pgtable-arm
> to handle multiple entries, per Robin's suggestion, eliminating
> the need to add functions to clear multiple PTEs.
> * Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable
> format.
> * arm_[v7s/lpae]_[map/unmap] will call
> arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page.
> * The arm_smmu_[map/unmap] functions have been removed, since they
> have been replaced by arm_smmu_[map/unmap]_pages.
>
> Changes since V2:
>
> * Added a check in __iommu_map() to check for the existence
> of either the map or map_pages callback as per Lu's suggestion.
>
> Changes since V1:
>
> * Implemented the map_pages() callbacks
> * Integrated Will's patches into this series which
> address several concerns about how iommu_pgsize() partitioned a
> buffer (I made a minor change to the patch which changes
> iommu_pgsize() to use bitmaps by using the ULL variants of
> the bitops)
>
> Isaac J. Manjarres (12):
> iommu/io-pgtable: Introduce unmap_pages() as a page table op
> iommu: Add an unmap_pages() op for IOMMU drivers
> iommu/io-pgtable: Introduce map_pages() as a page table op
> iommu: Add a map_pages() op for IOMMU drivers
> iommu: Add support for the map_pages() callback
> iommu/io-pgtable-arm: Prepare PTE methods for handling multiple
> entries
> iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
> iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
> iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
> iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
> iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
> iommu/arm-smmu: Implement the map_pages() IOMMU driver callback
>
> Will Deacon (3):
> iommu: Use bitmap to calculate page size in iommu_pgsize()
> iommu: Split 'addr_merge' argument to iommu_pgsize() into separate
> parts
> iommu: Hook up '->unmap_pages' driver callback
>
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 18 +--
> drivers/iommu/io-pgtable-arm-v7s.c | 50 ++++++-
> drivers/iommu/io-pgtable-arm.c | 189 +++++++++++++++++---------
> drivers/iommu/iommu.c | 130 +++++++++++++-----
> include/linux/io-pgtable.h | 8 ++
> include/linux/iommu.h | 9 ++
> 6 files changed, 289 insertions(+), 115 deletions(-)
>
More information about the linux-arm-kernel
mailing list