[PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough

Kanchan Joshi joshi.k at samsung.com
Sun Sep 4 22:52:09 PDT 2022


On Sun, Sep 04, 2022 at 02:17:33PM -0600, Jens Axboe wrote:
>On 9/4/22 11:01 AM, Kanchan Joshi wrote:
>> On Sat, Sep 03, 2022 at 11:00:43AM -0600, Jens Axboe wrote:
>>> On 9/2/22 3:25 PM, Jens Axboe wrote:
>>>> On 9/2/22 1:32 PM, Jens Axboe wrote:
>>>>> On 9/2/22 12:46 PM, Kanchan Joshi wrote:
>>>>>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote:
>>>>>>> On 9/2/22 10:06 AM, Jens Axboe wrote:
>>>>>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered
>>>>>>>>> buffers. This series adds the support in uring-cmd, and plumbs
>>>>>>>>> nvme passthrough to work with it.
>>>>>>>>>
>>>>>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS
>>>>>>>>> in my setup.
>>>>>>>>>
>>>>>>>>> Without fixedbufs
>>>>>>>>> *****************
>>>>>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
>>>>>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1
>>>>>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128
>>>>>>>>> Engine=io_uring, sq_ring=128, cq_ring=128
>>>>>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31
>>>>>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32
>>>>>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32
>>>>>>>>> ^CExiting on signal
>>>>>>>>> Maximum IOPS=1.85M
>>>>>>>>
>>>>>>>> With the poll support queued up, I ran this one as well. tldr is:
>>>>>>>>
>>>>>>>> bdev (non pt)??? 122M IOPS
>>>>>>>> irq driven??? 51-52M IOPS
>>>>>>>> polled??????? 71M IOPS
>>>>>>>> polled+fixed??? 78M IOPS
>>>
>>> Followup on this, since t/io_uring didn't correctly detect NUMA nodes
>>> for passthrough.
>>>
>>> With the current tree and the patchset I just sent for iopoll and the
>>> caching fix that's in the block tree, here's the final score:
>>>
>>> polled+fixed passthrough??? 105M IOPS
>>>
>>> which is getting pretty close to the bdev polled fixed path as well.
>>> I think that is starting to look pretty good!
>> Great! In my setup (single disk/numa-node), current kernel shows-
>>
>> Block MIOPS
>> ***********
>> command:t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -P1 -n1 /dev/nvme0n1
>> plain: 1.52
>> plain+fb: 1.77
>> plain+poll: 2.23
>> plain+fb+poll: 2.61
>>
>> Passthru MIOPS
>> **************
>> command:t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -P1 -u1 -n1 /dev/ng0n1
>> plain: 1.78
>> plain+fb: 2.08
>> plain+poll: 2.21
>> plain+fb+poll: 2.69
>
>Interesting, here's what I have:
>
>Block MIOPS
>============
>plain: 2.90
>plain+fb: 3.0
>plain+poll: 4.04
>plain+fb+poll: 5.09	
>
>Passthru MIPS
>=============
>plain: 2.37
>plain+fb: 2.84
>plain+poll: 3.65
>plain+fb+poll: 4.93
>
>This is a gen2 optane
same. Do you see same 'FW rev' as below?

# nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          PHAL11730018400AGN   INTEL SSDPF21Q400GB                      1         400.09  GB / 400.09  GB    512   B +  0 B   L0310200


>, it maxes out at right around 5.1M IOPS. Note that
>I have disabled iostats and merges generally in my runs:
>
>echo 0 > /sys/block/nvme0n1/queue/iostats
>echo 2 > /sys/block/nvme0n1/queue/nomerges
>
>which will impact block more than passthru obviously, particularly
>the nomerges. iostats should have a similar impact on both of them (but
>I haven't tested either of those without those disabled).

bit improvment after disabling, but for all entries.

block
=====
plain: 1.6
plain+FB: 1.91
plain+poll: 2.36
plain+FB+poll: 2.85

passthru
========
plain: 1.9
plain+FB: 2.2
plain+poll: 2.4
plain+FB+poll: 2.9

Maybe there is something about my kernel-config that prevents from
reaching to expected peak (i.e. 5.1M). Will check more.






More information about the Linux-nvme mailing list