nvme: split bios issued in reverse order
Sagi Grimberg
sagi at grimberg.me
Tue May 24 05:58:23 PDT 2022
> There seems to be an inconsistency in the order of writes that are
> issued after splitting a bio. Ordering depends on how the application
> write is submitted and the number of I/O queues configured.
>
> In our testing nvme/tcp,
Is this specific to nvme-tcp?
> a 128K write issued with fio/pvsync
is this specific to the io engine?
> is split
> into four 32K I/Os (the target maximum data transfer size is set to
> 32K, and max_sectors_kb is therefore 32K). As expected, the four write
> I/Os are issued to the target in sequential order. However, if the
> 128K write is issued using fio/libaio, the four 32K writes are issued
> in reverse order:
>
> fio-8098 [001] ..... 254009.711080: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16468, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
>
> fio-8098 [001] ..... 254009.711083: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16467, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
>
> fio-8098 [001] ..... 254009.711084: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16466, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
>
> fio-8098 [001] ..... 254009.711085: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16465, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
>
> Further investigation found that if the number of I/Os queues is
> limited to 1 at connect time,
Is this specific to a single I/O queue?
> the issue order is sequential for both
> pwritev and libaio.
I'm assuming that this is 100% repeatable?
>
> I've spent some time tracing through the bio/blk_mq code and
> can't seem to find what might be causing the difference in
> behavior. Can anyone confirm that this is expected or desired
> behavior?
What is the controller mdts? does the 32k go in-capsule? or does
the controller send r2t?
Also, if we assume that this is indeed the case, is this a fundamental
issue?
More information about the Linux-nvme
mailing list