NVMe scalability issue
Andrey Kuzmin
andrey.v.kuzmin at gmail.com
Tue Jun 2 12:03:54 PDT 2015
On Tue, Jun 2, 2015 at 1:52 AM, Ming Lin <mlin at kernel.org> wrote:
> Hi list,
>
> I'm playing with 8 high performance NVMe devices on a 4 sockets server.
> Each device can get 730K 4k read IOPS.
>
> Kernel: 4.1-rc3
> fio test shows it doesn't scale well with 4 or more devices.
> I wonder any possible direction to improve it.
>
> devices theory actual
> IOPS(K) IOPS(K)
> ------- ------- -------
> 1 733 733
> 2 1466 1446.8
> 3 2199 2174.5
> 4 2932 2354.9
> 5 3665 3024.5
> 6 4398 3818.9
> 7 5131 4526.3
> 8 5864 4621.2
>
> And a graph here:
> http://minggr.net/pub/20150601/nvme-scalability.jpg
>
>
> With 8 devices, CPU is still 43% idle, so CPU is not the bottleneck.
>
> "top" data
>
> Tasks: 565 total, 30 running, 535 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 17.5 us, 39.2 sy, 0.0 ni, 43.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 52833033+total, 3103032 used, 52522732+free, 18472 buffers
> KiB Swap: 7999484 total, 0 used, 7999484 free. 1506732 cached Mem
>
> "perf top" data
>
> PerfTop: 124581 irqs/sec kernel:78.6% exact: 0.0% [4000Hz cycles], (all, 48 CPUs)
> -----------------------------------------------------------------------------------------
>
> 3.30% [kernel] [k] do_blockdev_direct_IO
> 2.99% fio [.] get_io_u
> 2.79% fio [.] axmap_isset
Just a thought as well, but axmap_isset cpu usage is suspiciously
high, given a read-only workload where it's essentially a noop.
Regards,
Andrey
> 2.40% [kernel] [k] irq_entries_start
> 1.91% [kernel] [k] _raw_spin_lock
> 1.77% [kernel] [k] nvme_process_cq
> 1.73% [kernel] [k] _raw_spin_lock_irqsave
> 1.71% fio [.] fio_gettime
> 1.33% [kernel] [k] blk_account_io_start
> 1.24% [kernel] [k] blk_account_io_done
> 1.23% [kernel] [k] kmem_cache_alloc
> 1.23% [kernel] [k] nvme_queue_rq
> 1.22% fio [.] io_u_queued_complete
> 1.14% [kernel] [k] native_read_tsc
> 1.11% [kernel] [k] kmem_cache_free
> 1.05% [kernel] [k] __acct_update_integrals
> 1.01% [kernel] [k] context_tracking_exit
> 0.94% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.91% [kernel] [k] rcu_eqs_enter_common
> 0.86% [kernel] [k] cpuacct_account_field
> 0.84% fio [.] td_io_queue
>
> fio script
>
> [global]
> rw=randread
> bs=4k
> direct=1
> ioengine=libaio
> iodepth=64
> time_based
> runtime=60
> group_reporting
> numjobs=4
>
> [job0]
> filename=/dev/nvme0n1
>
> [job1]
> filename=/dev/nvme1n1
>
> [job2]
> filename=/dev/nvme2n1
>
> [job3]
> filename=/dev/nvme3n1
>
> [job4]
> filename=/dev/nvme4n1
>
> [job5]
> filename=/dev/nvme5n1
>
> [job6]
> filename=/dev/nvme6n1
>
> [job7]
> filename=/dev/nvme7n1
>
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
More information about the Linux-nvme
mailing list