[bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node
Ming Lei
ming.lei at redhat.com
Fri Jul 9 01:38:09 PDT 2021
Hello,
I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Please see the test result[1] 327K vs. 34.9K.
Latency trace shows that one big difference is in iommu_dma_unmap_sg(),
1111 nsecs vs 25437 nsecs.
[1] fio test & results
1) fio test result:
- run fio on local CPU
taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting
IOPS: 327K
avg latency of iommu_dma_unmap_sg(): 1111 nsecs
- run fio on remote CPU
taskset -c 80 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting
IOPS: 34.9K
avg latency of iommu_dma_unmap_sg(): 25437 nsecs
2) system info
[root at ampere-mtjade-04 ~]# lscpu | grep NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0-79
NUMA node1 CPU(s): 80-159
lspci | grep NVMe
0003:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
[root at ampere-mtjade-04 ~]# cat /sys/block/nvme1n1/device/device/numa_node
0
Thanks,
Ming
More information about the Linux-nvme
mailing list