nvme: split bios issued in reverse order

Jonathan Nicklin jnicklin at blockbridge.com
Tue May 24 15:54:51 PDT 2022


Just to clear up confusion on the observed behavior, Here's a trace
from a virtual NVME device in QEMU (thanks for the idea Keith).

# CPUs >1 to ensure multiple nvme queues
4

# Kernel
Linux debian 5.16.0-0.bpo.4-amd64 #1 SMP PREEMPT

# Simulate effect of NVME MDTS @ 32K
echo 32 > /sys/block/nvme0n1/queue/max_sectors_kb

# Basic QD1 Write w/ libaio (split IOs get issued in descending LBA order)
fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1
--bs=128K --ioengine=libaio --direct=1 --size=1M

fio-3201    [002] .....  1070.001305: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=4867, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001308: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=17154, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001309: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=29441, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001310: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=13056, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)

fio-3201    [002] .....  1070.001559: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=8963, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001560: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=21250, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001561: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=33537, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3201    [002] .....  1070.001561: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=3, cmdid=17152, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0)

# Basic QD1 Write w/ pvsync (split IOs get issued in ascending LBA order)
fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1
--bs=128K --ioengine=pvsync --direct=1 --size=1M

kworker/1:1H-139     [001] .....  1392.956314: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=33088, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956316: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=33089, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956318: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=16706, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956320: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=4419, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)

kworker/1:1H-139     [001] .....  1392.956537: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=37184, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956538: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=37185, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956539: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=20802, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
kworker/1:1H-139     [001] .....  1392.956540: nvme_setup_cmd: nvme0:
disk=nvme0n1, qid=2, cmdid=8515, nsid=1, flags=0x0, meta=0x0,
cmd=(nvme_cmd_write slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0)

On Tue, May 24, 2022 at 6:01 PM Jonathan Nicklin
<jnicklin at blockbridge.com> wrote:
>
> With the drive exercised as follows:
>
> app: fio
> engine: libaio
> queue depth: 1
> block size: 128K
> device max_sectors_kb: 32K
>
> A 128K user IO is split into four 32K I/Os which are then issued in
> reverse order as follows:
>
>              fio-12103   [001] ..... 89587.120514: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12328, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120515: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12327, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120518: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12326, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120518: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12325, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>
> On Tue, May 24, 2022 at 5:44 PM Damien Le Moal
> <damien.lemoal at opensource.wdc.com> wrote:
> >
> > On 5/25/22 05:32, Jonathan Nicklin wrote:
> > > On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch at kernel.org> wrote:
> > >>
> > >> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote:
> > >>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch at kernel.org> wrote:
> > >>>>
> > >>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote:
> > >>>>>
> > >>>>> The command lines you have are for read operations. The behavior seems
> > >>>>> only to appear with writes.
> > >>>>
> > >>>> Huh, I'll be darn...
> > >>>>
> > >>>> I think it's because the writes are plugged and the reads are not. The plug
> > >>>> appends requests to the head of the plug list, and unplugging will dispatch the
> > >>>> requests in the reverse order.
> > >>>
> > >>> Thanks for confirming! That's about where I got to. Do you have any
> > >>> ideas on what might explain the difference in behavior between
> > >>> fio/pvsync and fio/libaio? And, why does this not seem to occur when
> > >>> only one nvme queue is present? Perhaps the in-order cases are an
> > >>> indication of not being plugged?
> > >>
> > >> I actually didn't see a difference between libaio or psync, and also io_uring.
> > >> They all plugged and reversed the dispatch order. Do you have a scheduler
> > >> enabled?
> > >
> > > Nope, there's no scheduler in the way.
> >
> > If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1,
> > then plugging does not matter as there will be no merge (so no at-head
> > insertion in the plug). Commands will be executed in the user submission
> > order. At higher qd, if merge happen while plugged, then order will be
> > reversed (libaio with iodepth > 1 only).
> >
> > >
> >
> >
> > --
> > Damien Le Moal
> > Western Digital Research



More information about the Linux-nvme mailing list