nvme: split bios issued in reverse order

Sagi Grimberg sagi at grimberg.me
Tue May 24 05:58:23 PDT 2022


> There seems to be an inconsistency in the order of writes that are
> issued after splitting a bio. Ordering depends on how the application
> write is submitted and the number of I/O queues configured.
> 
> In our testing nvme/tcp,

Is this specific to nvme-tcp?

> a 128K write issued with fio/pvsync

is this specific to the io engine?

> is split
> into four 32K I/Os (the target maximum data transfer size is set to
> 32K, and max_sectors_kb is therefore 32K). As expected, the four write
> I/Os are issued to the target in sequential order. However, if the
> 128K write is issued using fio/libaio, the four 32K writes are issued
> in reverse order:
> 
> fio-8098 [001] ..... 254009.711080: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16468, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> 
> fio-8098 [001] ..... 254009.711083: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16467, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> 
> fio-8098 [001] ..... 254009.711084: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16466, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> 
> fio-8098 [001] ..... 254009.711085: nvme_setup_cmd: nvme1:
> disk=nvme1c1n1, qid=2, cmdid=16465, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> 
> Further investigation found that if the number of I/Os queues is
> limited to 1 at connect time,

Is this specific to a single I/O queue?

> the issue order is sequential for both
> pwritev and libaio.

I'm assuming that this is 100% repeatable?

> 
> I've spent some time tracing through the bio/blk_mq code and
> can't seem to find what might be causing the difference in
> behavior. Can anyone confirm that this is expected or desired
> behavior?

What is the controller mdts? does the 32k go in-capsule? or does
the controller send r2t?


Also, if we assume that this is indeed the case, is this a fundamental
issue?



More information about the Linux-nvme mailing list