Read speed for a PCIe NVMe SSD is ridiculously slow on a multi-socket machine.

Damien Le Moal damien.lemoal at opensource.wdc.com
Fri Mar 24 18:56:36 PDT 2023


On 3/25/23 09:33, Alexander Shumakovitch wrote:
> Hi Damien,
> 
> Just to add to my previous message, I've run the same set of tests on a
> small SATA SSD boot drive (Kingston A400) attached to the same system, and
> it turned out to be more or less node and I/O mode agnostic, producing
> consistent reading speeds of about 450MB/sec in the direct I/O mode and
> about 480MB/sec in the cached I/O mode. In particular, the cashed mode on
> a "wrong" NUMA node was significantly faster for this SATA SSD drive than
> for a NVMe one at about 170MB/sec (both drives are connected to CPU #0).

That is because the device itself is slower. So the page cache and NUMA overhead
is not really impacting the results. Try and HDD and you will see that it is
almost impossible to measure any difference.

> So my question becomes: why is the NVMe driver susceptible to (very) slow
> cached reads, while the AHCI one is not? Are there some fundamental
> differences in how AHCI and NVMe block devices handle page cache?

Because the device latency is much lower. So relatively, the overhead of the
page cache an NUMA is much larger. That overhead in absolute is the same as for
any device, but compared to the device latency, it is a small % for slow
devices, but a high % of the overall IO latency for fast devices.


-- 
Damien Le Moal
Western Digital Research




More information about the Linux-nvme mailing list