[bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

Ming Lei ming.lei at redhat.com
Fri Jul 9 07:21:39 PDT 2021


On Fri, Jul 09, 2021 at 11:16:14AM +0100, Russell King (Oracle) wrote:
> On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote:
> > I observed that NVMe performance is very bad when running fio on one
> > CPU(aarch64) in remote numa node compared with the nvme pci numa node.
> 
> Have you checked the effect of running a memory-heavy process using
> memory from node 1 while being executed by CPUs in node 0?

1) aarch64
[root at ampere-mtjade-04 ~]# taskset -c 0 numactl -m 0  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

      11.511752 GB/sec
[root at ampere-mtjade-04 ~]# taskset -c 0 numactl -m 1  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       3.084333 GB/sec


2) x86_64[1]
[root at hp-dl380g10-01 mingl]#  taskset -c 0 numactl -m 0  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       4.193927 GB/sec
[root at hp-dl380g10-01 mingl]#  taskset -c 0 numactl -m 1  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       3.553392 GB/sec


[1] on this x86_64 machine, IOPS can reach 680K in same fio nvme test 



Thanks,
Ming




More information about the linux-arm-kernel mailing list