[RFC PATCH 0/5] Optimization for unmapping iommu mapped buffers
chenxiang (M)
chenxiang66 at hisilicon.com
Thu Apr 1 04:28:51 BST 2021
Hi Isaac,
在 2021/3/31 11:00, Isaac J. Manjarres 写道:
> When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
> the buffer at a granule of the largest page size that is supported by
> the IOMMU hardware and fits within the buffer. For every block that
> is unmapped, the IOMMU framework will call into the IOMMU driver, and
> then the io-pgtable framework to walk the page tables to find the entry
> that corresponds to the IOVA, and then unmaps the entry.
>
> This can be suboptimal in scenarios where a buffer or a piece of a
> buffer can be split into several contiguous page blocks of the same size.
> For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
> blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
> unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
> and 2 page table walks, to unmap 2 entries that are next to each other in
> the page-tables, when both entries could have been unmapped in one shot
> by clearing both page table entries in the same call.
>
> These patches implement a callback called unmap_pages to the io-pgtable
> code and IOMMU drivers which unmaps an IOVA range that consists of a
> number of pages of the same page size that is supported by the IOMMU
> hardware, and allows for clearing multiple entries in the same set of
> indirect calls. The reason for introducing unmap_pages is to give
> other IOMMU drivers/io-pgtable formats time to change to using the new
> unmap_pages callback, so that the transition to using this approach can be
> done piecemeal.
>
> The same optimization is applicable for mapping buffers, however, the
> error handling in the io-pgtable layer couldn't be handled cleanly, as we
> would need to invoke iommu_unmap to unmap the parts of the buffer that
> were mapped, and then do any TLB maintenance. However, that seemed like a
> layering violation.
>
> Any feedback is very much appreciated.
I apply those patchset and implement it for smmuv3 on my kunpeng ARM64
platform, and test the latency of map/unmap
with tool dma_map_benchmark (./dma_map_benchmark -g xxx), it promotes
much on the latency of unmap(us).
Maybe you can add the implement for smmuv3 also. The test result is as
follows:
latency of map/unmap(before opt) latency of
map/unmap(after opt)
g=1(4K size) 0.1/0.7 0.1/0.7
g=2(8K size) 0.2/1.4 0.2/0.8
g=4(16K size) 0.3/2.7 0.3/0.9
g=8(32K size) 0.5/5.4 0.5/1.2
g=16(64K size) 1/10.7 1/1.8
g=32(128K size) 1.8/21.4 1.8/2.8
g=64(256K size) 3.6/42.9 3.6/5.1
g=128(512K size) 7/85.6 7/8.6
g=256(1M size) 13.9/171.1 13.9/15.5
g=512(2M size) 0.2/0.7 0.2/0.9
g=1024(4M size) 0.3/1.5 0.3/1.1
The change for smmuv3 for test are as follows:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a..e0268b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2292,6 +2292,20 @@ static size_t arm_smmu_unmap(struct iommu_domain
*domain, unsigned long iova,
return ops->unmap(ops, iova, size, gather);
}
+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain,
unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
+{
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+
+ if (!ops)
+ return 0;
+
+ return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+}
+
+
static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
{
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2613,6 +2627,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map = arm_smmu_map,
.unmap = arm_smmu_unmap,
+ .unmap_pages = arm_smmu_unmap_pages,
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
>
> Thanks,
> Isaac
>
> Isaac J. Manjarres (5):
> iommu/io-pgtable: Introduce unmap_pages() as a page table op
> iommu: Add an unmap_pages() op for IOMMU drivers
> iommu: Add support for the unmap_pages IOMMU callback
> iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
> iommu/arm-smmu: Implement the unmap_pages IOMMU driver callback
>
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 +++++
> drivers/iommu/io-pgtable-arm.c | 114 +++++++++++++++++++++-----
> drivers/iommu/iommu.c | 44 ++++++++--
> include/linux/io-pgtable.h | 4 +
> include/linux/iommu.h | 4 +
> 5 files changed, 159 insertions(+), 26 deletions(-)
>
More information about the linux-arm-kernel
mailing list