Read speed for a PCIe NVMe SSD is ridiculously slow on a multi-socket machine.

Alexander Shumakovitch shurik at jhu.edu
Thu Mar 23 23:56:03 PDT 2023


[ please copy me on your replies since I'm not subscribed to this list ]

Hello all,

I have an oldish quad socket server (Stratos S400-X44E by Quanta, 512GB RAM,
4 x Xeon E5-4620) that I'm trying to upgrade with an NVMe Samsung 970 EVO
Plus SSD, connected via an adapter card to a PCIe slot, which is wired to
CPU #0 directly and supports PCIe 3.0 speeds. For some reason, the reading
speed from this SSD differs by a factor of 10 (ten!), depending on which
physical CPU hdparm or dd is run on:
      
    # hdparm -t /dev/nvme0n1 
    
    /dev/nvme0n1:
     Timing buffered disk reads: 510 MB in  3.01 seconds = 169.28 MB/sec
    
    # taskset -c 0-7 hdparm -t /dev/nvme0n1 
    
    /dev/nvme0n1:
     Timing buffered disk reads: 5252 MB in  3.00 seconds = 1750.28 MB/sec
    
    # taskset -c 8-15 hdparm -t /dev/nvme0n1 
    
    /dev/nvme0n1:
     Timing buffered disk reads: 496 MB in  3.01 seconds = 164.83 MB/sec
    
    # taskset -c 24-31 hdparm -t /dev/nvme0n1 
    
    /dev/nvme0n1:
     Timing buffered disk reads: 520 MB in  3.01 seconds = 172.65 MB/sec

Even more mysteriously, the writing speeds are consistent across all the
CPUs at about 800MB/sec (see the output of dd attached). Please note that
I'm not worrying about the fine tuning of the performance at this point,
and in particular I'm perfectly fine with 1/2 of the theoretical reading
speed. I just want to understand where 90% of the bandwidth gets lost.
No error of any kind appears in the syslog.

I don't think this is NUMA related since the QPI interconnect runs as
specced at 4GB/sec, when measured by Intel's Memory Latency Checker, more
than enough for NVMe to run at full speed. Also, the CUDA benchmark test
runs at expected speeds across the QPI.

Just in case, I'm attaching the output of lstopo to this message. Please
note that this computer has a BIOS bug that doesn't let kernel populate
the values of numa_node in /sys/devices/pci0000:* automatically, so I have
to do this myself after each boot.

I've tried removing all other PCI add-on cards, moving the SSD to another
slot, changing the number of polling queues for the nvme driver, and even
setting dm-multipath up. But none of these makes any material difference
in reading speed.

System info: Debian 11.6 (stable) running Linux 5.19.11 (config file attached)
Output of "nvme list":

    Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
    ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
    /dev/nvme0n1     S58SNS0R705048H      Samsung SSD 970 EVO Plus 500GB           1           0.00   B / 500.11  GB    512   B +  0 B   2B2QEXM7

Output of "nvme list-subsys"":

    nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:144d144dS58SNS0R705048H     Samsung SSD 970 EVO Plus 500GB          
    \
     +- nvme0 pcie 0000:03:00.0 live 

I would be grateful if you could point me in the right direction. I'm
attaching outputs of the following commands to this message: dmesg,
"cat /proc/cpuinfo", "ls -vvv", lstopo, and dd (both for reading from
and writing to this SSD). Please let me know if you need any other info
from me.

Thank you,

   Alex Shumakovitch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: config-5.19.0-0.deb11.2-amd64
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cpu_info.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0005.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dd_output.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0006.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0007.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lspci_output.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0008.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lstopo_output.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230324/77036959/attachment-0009.txt>


More information about the Linux-nvme mailing list