Low md raid1 performance unless forcing to use the VFS layer (io-cmd-file).

Mark Ruijter MRuijter at onestopsystems.com
Thu Oct 17 09:20:46 PDT 2019


When I export a md raid1 /dev/mdX device using nvmet, the 4k random write performance is limited to 250K iops, even though fio allows 850k iops on the target system.
A single raid1 thread consumes 100% of the CPU and this seems to be the bottleneck.

When I format the /dev/mdX device using XFS and export a file from that filesystem the write performance increases 300%.
To verify that using the VFS made the difference I wrote a patch (attached) that allows me to forcefully select using the io-cmd-file code.

When the nvme target has been configured normally and all io is handled by  io-cmd-bdev.c the performance reported by fio is:

fio --name=nvme_tcp --rw=randwrite --bs=4k --filename=/dev/nvme0n1 --numjobs=32 --iodepth=128 --exitall --direct=1 --group_reporting --time_based --runtime=300 --size=32G --ioengine=libaio
nvme_tcp: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.1
Starting 32 processes
^Cbs: 32 (f=32): [w(32)][2.0%][r=0KiB/s,w=638MiB/s][r=0,w=163k IOPS][eta 04m:54s]
fio: terminating on signal 2

nvme_tcp: (groupid=0, jobs=32): err= 0: pid=18081: Thu Oct 17 09:55:26 2019
  write: IOPS=174k, BW=679MiB/s (712MB/s)(4518MiB/6655msec)

A single process can be seen top running on the target system:
28039 root      20   0       0      0      0 R 100.00 0.000   0:09.52 md1_raid1

When I activate the attached patch below the performance goes up a lot!
/sys/kernel/config/nvmet/subsystems/clr1/namespaces/1 # ls -l
total 0
-rw-r--r-- 1 root root 4096 Oct 17 10:18 ana_grpid
-rw-r--r-- 1 root root 4096 Oct 17 10:18 buffered_io
-rw-r--r-- 1 root root 4096 Oct 17 06:59 device_nguid
-rw-r--r-- 1 root root 4096 Oct 17 09:54 device_path
-rw-r--r-- 1 root root 4096 Oct 17 06:59 device_uuid
-rw-r--r-- 1 root root 4096 Oct 17 09:59 enable
-rw-r--r-- 1 root root 4096 Oct 17 09:59 use_vfs

Note the new attr: use_vfs which is 0 by default.

/sys/kernel/config/nvmet/subsystems/clr1/namespaces/1 # cat device_path 
/dev/md1
/sys/kernel/config/nvmet/subsystems/clr1/namespaces/1 # echo 0 > enable 
/sys/kernel/config/nvmet/subsystems/clr1/namespaces/1 # echo 1 > use_vfs 
/sys/kernel/config/nvmet/subsystems/clr1/namespaces/1 # echo 1 > enable

fio --name=nvme_tcp --rw=randwrite --bs=4k --filename=/dev/nvme0n1 --numjobs=32 --iodepth=128 --exitall --direct=1 --group_reporting --time_based --runtime=300 --size=32G --ioengine=libaio
nvme_tcp: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.1
Starting 32 processes
^Cbs: 32 (f=32): [w(32)][8.3%][r=0KiB/s,w=2348MiB/s][r=0,w=601k IOPS][eta 04m:35s]
fio: terminating on signal 2

nvme_tcp: (groupid=0, jobs=32): err= 0: pid=18227: Thu Oct 17 10:00:51 2019
  write: IOPS=640k, BW=2500MiB/s (2622MB/s)(61.0GiB/25374msec)

The target system now shows many kernel threads with low to moderate loads ( 32% ~ 70%).

Since the io-cmd-file code enforces direct-io the performance should only be slightly less compared to io-cmd-bdev.c which uses the bio interface.
However when writing to a md raid1 the opposite is true.

I still have to find out if the md raid1 kernel module or the nvmet driver is to blame.
However since using the VFS 'fixes' the issue I expect that the problem can likely be the nvmet blockio implementation.

Has anyone seen this issue with md raid1 before? 
Or does anyone have ideas about this problem?

Thank you,
 
Mark Ruijter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vfs.patch
Type: application/octet-stream
Size: 3700 bytes
Desc: vfs.patch
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20191017/c7bd656d/attachment.obj>


More information about the Linux-nvme mailing list