Enabling poll_queues on NVME with kernel-5.x

Varad Gautam varadgautam at gmail.com
Thu Jul 23 04:59:21 EDT 2020


Hi,

Since commit a4668d9ba ("nvme: default to 0 poll queues") [1], the
nvme driver needs to be explicitly configured with poll_queues > 0 to
allow enabling io_poll.

However, prior to poll queues separation in 4b04cc6a8 ("nvme: add
separate poll queue map") [2], io_poll was enabled by default on nvme
block devices.

This is leading to higher io latencies on nvme drives by default
(nvme.poll_queues=0, io_poll=0), visible with fio slat/clat/lat below.

The commit [1] says:

> We need a better way of configuring this, and given that polling is
> (still) a bit niche, let's default to using 0 poll queues.

Are there any plans / work needed for nvme to provide > 0 poll_queues
by default?

kernel-5.4, io_poll=0
---------------------
bash-4.1$ sudo fio /tmp/fio-workload
fio-workload: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=libaio, iodepth=1
fio 1.55
Starting 1 process
Jobs: 1 (f=3), CR=2000/0 IOPS: [m] [97.3% done] [17006K/17219K /s]
[1038 /1051  iops] [eta 00m:08s]]
fio-workload: (groupid=0, jobs=1): err= 0: pid=51062
  read : io=4564.1MB, bw=16013KB/s, iops=1000 , runt=291918msec
    slat (usec): min=3 , max=75 , avg= 5.61, stdev= 2.33
    clat (usec): min=59 , max=2584 , avg=173.86, stdev=307.87
     lat (usec): min=78 , max=2590 , avg=180.07, stdev=307.88
    bw (KB/s) : min=13140, max=19040, per=100.21%, avg=16045.14, stdev=1015.87
  write: io=4565.4MB, bw=16014KB/s, iops=1000 , runt=291918msec
    slat (usec): min=4 , max=86 , avg= 6.33, stdev= 2.85
    clat (usec): min=3 , max=420 , avg=39.54, stdev= 5.31
     lat (usec): min=40 , max=427 , avg=46.49, stdev= 6.53
    bw (KB/s) : min=13090, max=19104, per=100.22%, avg=16049.96, stdev=1055.02
  cpu          : usr=0.76%, sys=2.24%, ctx=587979, majf=0, minf=183
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=292153/292180/0, short=0/0/0
     lat (usec): 4=0.01%, 10=0.01%, 20=0.01%, 50=48.66%, 100=11.94%
     lat (usec): 250=35.31%, 500=2.28%, 750=0.24%, 1000=0.18%
     lat (msec): 2=0.72%, 4=0.67%

Run status group 0 (all jobs):
   READ: io=4564.1MB, aggrb=16012KB/s, minb=16397KB/s, maxb=16397KB/s,
mint=291918msec, maxt=291918msec
  WRITE: io=4565.4MB, aggrb=16014KB/s, minb=16398KB/s, maxb=16398KB/s,
mint=291918msec, maxt=291918msec

Disk stats (read/write):
  nvme0n1: ios=292073/292298, merge=0/135, ticks=50570/11593,
in_queue=0, util=35.54%

kernel-5.4 nvme.poll_queues=32 io_poll=1
----------------------------------------
bash-4.1$ sudo fio /tmp/fio-workload
fio-workload: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=libaio, iodepth=1
fio 1.55
Starting 1 process
Jobs: 1 (f=3), CR=2000/0 IOPS: [m] [97.3% done] [16809K/16678K /s]
[1026 /1018  iops] [eta 00m:08s]
fio-workload: (groupid=0, jobs=1): err= 0: pid=11017
  read : io=4565.5MB, bw=16009KB/s, iops=1000 , runt=291994msec
    slat (usec): min=3 , max=81 , avg= 5.41, stdev= 2.32
    clat (usec): min=60 , max=2593 , avg=165.01, stdev=309.91
     lat (usec): min=76 , max=2598 , avg=171.04, stdev=309.94
    bw (KB/s) : min=13076, max=19008, per=100.20%, avg=16041.43, stdev=926.95
  write: io=4565.2MB, bw=16010KB/s, iops=1000 , runt=291994msec
    slat (usec): min=3 , max=84 , avg= 6.04, stdev= 2.78
    clat (usec): min=2 , max=280 , avg=36.30, stdev= 4.14
     lat (usec): min=37 , max=286 , avg=42.96, stdev= 5.19
    bw (KB/s) : min=13085, max=19168, per=100.21%, avg=16042.78, stdev=978.02
  cpu          : usr=0.68%, sys=2.28%, ctx=587989, majf=0, minf=186
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=292163/292170/0, short=0/0/0
     lat (usec): 4=0.01%, 10=0.01%, 20=0.01%, 50=49.39%, 100=21.80%
     lat (usec): 250=25.09%, 500=2.11%, 750=0.13%, 1000=0.12%
     lat (msec): 2=0.65%, 4=0.70%

Run status group 0 (all jobs):
   READ: io=4565.5MB, aggrb=16009KB/s, minb=16393KB/s, maxb=16393KB/s,
mint=291994msec, maxt=291994msec
  WRITE: io=4565.2MB, aggrb=16009KB/s, minb=16393KB/s, maxb=16393KB/s,
mint=291994msec, maxt=291994msec

Disk stats (read/write):
  nvme0n1: ios=292051/292222, merge=0/143, ticks=47967/10589,
in_queue=0, util=34.31%


[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=a4668d9ba
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=4b04cc6a8

Thanks,
Varad Gautam



More information about the Linux-nvme mailing list