[RFC PATCH 0/5] Optimization for unmapping iommu mapped buffers

chenxiang (M) chenxiang66 at hisilicon.com
Thu Apr 1 04:28:51 BST 2021


Hi Isaac,


在 2021/3/31 11:00, Isaac J. Manjarres 写道:
> When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
> the buffer at a granule of the largest page size that is supported by
> the IOMMU hardware and fits within the buffer. For every block that
> is unmapped, the IOMMU framework will call into the IOMMU driver, and
> then the io-pgtable framework to walk the page tables to find the entry
> that corresponds to the IOVA, and then unmaps the entry.
>
> This can be suboptimal in scenarios where a buffer or a piece of a
> buffer can be split into several contiguous page blocks of the same size.
> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
> unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
> and 2 page table walks, to unmap 2 entries that are next to each other in
> the page-tables, when both entries could have been unmapped in one shot
> by clearing both page table entries in the same call.
>
> These patches implement a callback called unmap_pages to the io-pgtable
> code and IOMMU drivers which unmaps an IOVA range that consists of a
> number of pages of the same page size that is supported by the IOMMU
> hardware, and allows for clearing multiple entries in the same set of
> indirect calls. The reason for introducing unmap_pages is to give
> other IOMMU drivers/io-pgtable formats time to change to using the new
> unmap_pages callback, so that the transition to using this approach can be
> done piecemeal.
>
> The same optimization is applicable for mapping buffers, however, the
> error handling in the io-pgtable layer couldn't be handled cleanly, as we
> would need to invoke iommu_unmap to unmap the parts of the buffer that
> were mapped, and then do any TLB maintenance. However, that seemed like a
> layering violation.
>
> Any feedback is very much appreciated.

I apply those patchset and implement it for smmuv3 on my kunpeng ARM64 
platform, and test the latency of map/unmap
with tool dma_map_benchmark (./dma_map_benchmark -g xxx),  it promotes 
much on the latency of unmap(us).
Maybe you can add the implement for smmuv3 also. The test result is as 
follows:

                         latency of map/unmap(before opt)    latency of 
map/unmap(after opt)
g=1(4K size)                    0.1/0.7     0.1/0.7
g=2(8K size)                    0.2/1.4     0.2/0.8
g=4(16K size)                  0.3/2.7     0.3/0.9
g=8(32K size)                  0.5/5.4     0.5/1.2
g=16(64K size)                1/10.7 1/1.8
g=32(128K size)              1.8/21.4 1.8/2.8
g=64(256K size)              3.6/42.9 3.6/5.1
g=128(512K size)            7/85.6 7/8.6
g=256(1M size)              13.9/171.1 13.9/15.5
g=512(2M size)              0.2/0.7 0.2/0.9
g=1024(4M size)            0.3/1.5 0.3/1.1


The change for smmuv3 for test are as follows:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a..e0268b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2292,6 +2292,20 @@ static size_t arm_smmu_unmap(struct iommu_domain 
*domain, unsigned long iova,
         return ops->unmap(ops, iova, size, gather);
  }

+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, 
unsigned long iova,
+                            size_t pgsize, size_t pgcount,
+                            struct iommu_iotlb_gather *gather)
+{
+       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+       struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+
+       if (!ops)
+               return 0;
+
+       return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+}
+
+
  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
  {
         struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2613,6 +2627,7 @@ static struct iommu_ops arm_smmu_ops = {
         .attach_dev             = arm_smmu_attach_dev,
         .map                    = arm_smmu_map,
         .unmap                  = arm_smmu_unmap,
+       .unmap_pages            = arm_smmu_unmap_pages,
         .flush_iotlb_all        = arm_smmu_flush_iotlb_all,
         .iotlb_sync             = arm_smmu_iotlb_sync,
         .iova_to_phys           = arm_smmu_iova_to_phys,

>
> Thanks,
> Isaac
>
> Isaac J. Manjarres (5):
>    iommu/io-pgtable: Introduce unmap_pages() as a page table op
>    iommu: Add an unmap_pages() op for IOMMU drivers
>    iommu: Add support for the unmap_pages IOMMU callback
>    iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
>    iommu/arm-smmu: Implement the unmap_pages IOMMU driver callback
>
>   drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 +++++
>   drivers/iommu/io-pgtable-arm.c        | 114 +++++++++++++++++++++-----
>   drivers/iommu/iommu.c                 |  44 ++++++++--
>   include/linux/io-pgtable.h            |   4 +
>   include/linux/iommu.h                 |   4 +
>   5 files changed, 159 insertions(+), 26 deletions(-)
>





More information about the linux-arm-kernel mailing list