NVMe scalability issue
Andrey Kuzmin
andrey.v.kuzmin at gmail.com
Tue Jun 2 12:11:08 PDT 2015
On Tue, Jun 2, 2015 at 10:09 PM, Jens Axboe <axboe at fb.com> wrote:
> On 06/02/2015 01:03 PM, Andrey Kuzmin wrote:
>>
>> On Tue, Jun 2, 2015 at 1:52 AM, Ming Lin <mlin at kernel.org> wrote:
>>>
>>> Hi list,
>>>
>>> I'm playing with 8 high performance NVMe devices on a 4 sockets server.
>>> Each device can get 730K 4k read IOPS.
>>>
>>> Kernel: 4.1-rc3
>>> fio test shows it doesn't scale well with 4 or more devices.
>>> I wonder any possible direction to improve it.
>>>
>>> devices theory actual
>>> IOPS(K) IOPS(K)
>>> ------- ------- -------
>>> 1 733 733
>>> 2 1466 1446.8
>>> 3 2199 2174.5
>>> 4 2932 2354.9
>>> 5 3665 3024.5
>>> 6 4398 3818.9
>>> 7 5131 4526.3
>>> 8 5864 4621.2
>>>
>>> And a graph here:
>>> http://minggr.net/pub/20150601/nvme-scalability.jpg
>>>
>>>
>>> With 8 devices, CPU is still 43% idle, so CPU is not the bottleneck.
>>>
>>> "top" data
>>>
>>> Tasks: 565 total, 30 running, 535 sleeping, 0 stopped, 0 zombie
>>> %Cpu(s): 17.5 us, 39.2 sy, 0.0 ni, 43.3 id, 0.0 wa, 0.0 hi, 0.0 si,
>>> 0.0 st
>>> KiB Mem: 52833033+total, 3103032 used, 52522732+free, 18472 buffers
>>> KiB Swap: 7999484 total, 0 used, 7999484 free. 1506732 cached
>>> Mem
>>>
>>> "perf top" data
>>>
>>> PerfTop: 124581 irqs/sec kernel:78.6% exact: 0.0% [4000Hz
>>> cycles], (all, 48 CPUs)
>>>
>>> -----------------------------------------------------------------------------------------
>>>
>>> 3.30% [kernel] [k] do_blockdev_direct_IO
>>> 2.99% fio [.] get_io_u
>>> 2.79% fio [.] axmap_isset
>>
>>
>> Just a thought as well, but axmap_isset cpu usage is suspiciously
>> high, given a read-only workload where it's essentially a noop.
>
>
> Read or write doesn't matter, it's still marked in the random map. Both of
> them will maintain that state.
>
Not sure keeping track of blocks read was the intention in the test,
so it's worth rerunning with norandommap=1.
Regards,
Andrey
> --
> Jens Axboe
>
More information about the Linux-nvme
mailing list