[PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
Jason Gunthorpe
jgg at nvidia.com
Thu May 2 09:25:30 PDT 2024
On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen at huawei.com> that add
> support for SMMU ECMDQ feature.
>
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.
So this is not just measuring invalidation performance but basically
the DMA API performance to do map/unmap operations? What is "batch
size" ?
Does this HW support the range invalidation? How many invalidation
commands does earch test cycle generate?
> Total Requests Average Requests Difference
> Per CPU wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ 239286381 2991079
> CMDQ Batch Size 1 228232187 2852902 -4.62%
> CMDQ Batch Size 32 233465784 2918322 -2.43%
> CMDQ Batch Size 64 231679588 2895994 -3.18%
> CMDQ Batch Size 128 233189030 2914862 -2.55%
> CMDQ Batch Size 256 230965773 2887072 -3.48%
If this is really 5% for a typical DMA API map/unmap cycle then that
seems interesting to me.
If it is 5% for just the invalidation command then it is harder to
say.
I'd suggest to present your results in terms of latency to do a dma
API map/unmap cycle, and to show how the results scale as you add more
threads. Does even 2 threads start to show a 4-5% gain?
Also, I'm wondering how ATS would interact, I think we have a
head-of-line blocking issue with ATS.. Allowing ATS to progress
concurrently with unrelated parallel invalidations may also be
interesting, esepcially for multi-SVA workloads.
Jason
More information about the linux-arm-kernel
mailing list