Read speed for a PCIe NVMe SSD is ridiculously slow on a multi-socket machine.
Damien Le Moal
damien.lemoal at opensource.wdc.com
Fri Mar 24 01:43:42 PDT 2023
On 3/24/23 15:56, Alexander Shumakovitch wrote:
> [ please copy me on your replies since I'm not subscribed to this list ]
>
> Hello all,
>
> I have an oldish quad socket server (Stratos S400-X44E by Quanta, 512GB RAM,
> 4 x Xeon E5-4620) that I'm trying to upgrade with an NVMe Samsung 970 EVO
> Plus SSD, connected via an adapter card to a PCIe slot, which is wired to
> CPU #0 directly and supports PCIe 3.0 speeds. For some reason, the reading
> speed from this SSD differs by a factor of 10 (ten!), depending on which
> physical CPU hdparm or dd is run on:
>
> # hdparm -t /dev/nvme0n1
It is very unusual to use hdparm, a tool designed mainly for ATA devices, to
benchmark an nvme device. At the very least, if you really want to measure the
drive performance, you should add the --direct option (see man hdparm).
But a better way to test would be to use fio with io_uring or libaio IO engine
doing multi-job & high QD --direct=1 IOs. That will give you the maximum
performance of your device. Then remove the --direct=1 option to do buffered
IOs, which will expose potential issues with your system memory bandwidth.
>
> /dev/nvme0n1:
> Timing buffered disk reads: 510 MB in 3.01 seconds = 169.28 MB/sec
>
> # taskset -c 0-7 hdparm -t /dev/nvme0n1
>
> /dev/nvme0n1:
> Timing buffered disk reads: 5252 MB in 3.00 seconds = 1750.28 MB/sec
>
> # taskset -c 8-15 hdparm -t /dev/nvme0n1
>
> /dev/nvme0n1:
> Timing buffered disk reads: 496 MB in 3.01 seconds = 164.83 MB/sec
>
> # taskset -c 24-31 hdparm -t /dev/nvme0n1
>
> /dev/nvme0n1:
> Timing buffered disk reads: 520 MB in 3.01 seconds = 172.65 MB/sec
>
> Even more mysteriously, the writing speeds are consistent across all the
> CPUs at about 800MB/sec (see the output of dd attached). Please note that
> I'm not worrying about the fine tuning of the performance at this point,
> and in particular I'm perfectly fine with 1/2 of the theoretical reading
> speed. I just want to understand where 90% of the bandwidth gets lost.
> No error of any kind appears in the syslog.
>
> I don't think this is NUMA related since the QPI interconnect runs as
> specced at 4GB/sec, when measured by Intel's Memory Latency Checker, more
> than enough for NVMe to run at full speed. Also, the CUDA benchmark test
> runs at expected speeds across the QPI.
>
> Just in case, I'm attaching the output of lstopo to this message. Please
> note that this computer has a BIOS bug that doesn't let kernel populate
> the values of numa_node in /sys/devices/pci0000:* automatically, so I have
> to do this myself after each boot.
>
> I've tried removing all other PCI add-on cards, moving the SSD to another
> slot, changing the number of polling queues for the nvme driver, and even
> setting dm-multipath up. But none of these makes any material difference
> in reading speed.
>
> System info: Debian 11.6 (stable) running Linux 5.19.11 (config file attached)
> Output of "nvme list":
>
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 S58SNS0R705048H Samsung SSD 970 EVO Plus 500GB 1 0.00 B / 500.11 GB 512 B + 0 B 2B2QEXM7
>
> Output of "nvme list-subsys"":
>
> nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:144d144dS58SNS0R705048H Samsung SSD 970 EVO Plus 500GB
> \
> +- nvme0 pcie 0000:03:00.0 live
>
> I would be grateful if you could point me in the right direction. I'm
> attaching outputs of the following commands to this message: dmesg,
> "cat /proc/cpuinfo", "ls -vvv", lstopo, and dd (both for reading from
> and writing to this SSD). Please let me know if you need any other info
> from me.
>
> Thank you,
>
> Alex Shumakovitch
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list