nvme: split bios issued in reverse order

Damien Le Moal damien.lemoal at opensource.wdc.com
Tue May 24 18:18:02 PDT 2022


On 5/25/22 07:01, Jonathan Nicklin wrote:
> With the drive exercised as follows:
> 
> app: fio
> engine: libaio
> queue depth: 1
> block size: 128K
> device max_sectors_kb: 32K
> 
> A 128K user IO is split into four 32K I/Os which are then issued in
> reverse order as follows:
> 
>              fio-12103   [001] ..... 89587.120514: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12328, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120515: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12327, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120518: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12326, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)
>              fio-12103   [001] ..... 89587.120518: nvme_setup_cmd:
> nvme1: disk=nvme1c1n1, qid=2, cmdid=12325, nsid=1, flags=0x0,
> meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0,
> reftag=0)

Yep, expected. This is the same as doing libaio with iodepth=4 and bs=32K.
Your max_sectors_kb is really very small though. Can't you increase its
value ? (max_hw_sector_kb tells you the device max but other limitations
may apply, e.g. prp vs sgl)

> 
> On Tue, May 24, 2022 at 5:44 PM Damien Le Moal
> <damien.lemoal at opensource.wdc.com> wrote:
>>
>> On 5/25/22 05:32, Jonathan Nicklin wrote:
>>> On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch at kernel.org> wrote:
>>>>
>>>> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote:
>>>>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch at kernel.org> wrote:
>>>>>>
>>>>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote:
>>>>>>>
>>>>>>> The command lines you have are for read operations. The behavior seems
>>>>>>> only to appear with writes.
>>>>>>
>>>>>> Huh, I'll be darn...
>>>>>>
>>>>>> I think it's because the writes are plugged and the reads are not. The plug
>>>>>> appends requests to the head of the plug list, and unplugging will dispatch the
>>>>>> requests in the reverse order.
>>>>>
>>>>> Thanks for confirming! That's about where I got to. Do you have any
>>>>> ideas on what might explain the difference in behavior between
>>>>> fio/pvsync and fio/libaio? And, why does this not seem to occur when
>>>>> only one nvme queue is present? Perhaps the in-order cases are an
>>>>> indication of not being plugged?
>>>>
>>>> I actually didn't see a difference between libaio or psync, and also io_uring.
>>>> They all plugged and reversed the dispatch order. Do you have a scheduler
>>>> enabled?
>>>
>>> Nope, there's no scheduler in the way.
>>
>> If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1,
>> then plugging does not matter as there will be no merge (so no at-head
>> insertion in the plug). Commands will be executed in the user submission
>> order. At higher qd, if merge happen while plugged, then order will be
>> reversed (libaio with iodepth > 1 only).
>>
>>>
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research


-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list