[LSF/MM/BPF TOPIC] Improving Zoned Storage Support

Wed Jan 17 16:54:28 PST 2024

On 1/17/24 16:42, Jens Axboe wrote:
> On 1/17/24 5:38 PM, Bart Van Assche wrote:
>> On 1/17/24 10:43, Jens Axboe wrote:
>>> Do we care? Maybe not, if we accept that an IO scheduler is just for
>>> "slower devices". But let's not go around spouting some 200K number as
>>> if it's gospel, when it depends on so many factors like IO workload,
>>> system used, etc.
>> I've never seen more than 200K IOPS in a single-threaded test. Since
>> your tests report higher IOPS numbers, I assume that you are submitting
>> I/O from multiple CPU cores at the same time.
> 
> Single core, using mq-deadline (with the poc patch, but number without
> you can already find in a previous reply):
> 
> axboe at 7950x ~/g/fio (master)> cat /sys/block/nvme0n1/queue/scheduler
> none [mq-deadline]
> axboe at 7950x ~/g/fio (master)> sudo t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R1 -X1 -n1 /dev/nvme0n1
> 
> submitter=0, tid=1957, file=/dev/nvme0n1, node=-1
> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=5.10M, BW=2.49GiB/s, IOS/call=32/31
> IOPS=5.10M, BW=2.49GiB/s, IOS/call=32/32
> IOPS=5.10M, BW=2.49GiB/s, IOS/call=31/31
> 
> Using non-polled IO, the number is around 4M.

A correction: my tests ran with 72 fio jobs instead of 1. I used
fio + io_uring + null_blk in my tests. I see about 1100 K IOPS with
a single fio job and about 150 K IOPS with 72 fio jobs. This shows
how I measured mq-deadline performance:

modprobe null_blk
fio --bs=4096 --group_reporting=1 --gtod_reduce=1 --invalidate=1 \
     --ioengine=io_uring --ioscheduler=mq-deadline --norandommap \
     --runtime=60 --rw=randread --thread --time_based=1 --buffered=0 \
     --numjobs=72 --iodepth=128 --iodepth_batch_submit=64 \
     --iodepth_batch_complete=64 --name=/dev/nullb0 --filename=/dev/nullb0

Thanks,

Bart.