don't reorder requests passed to ->queue_rqs
Chaitanya Kulkarni
chaitanyak at nvidia.com
Wed Nov 13 12:36:27 PST 2024
On 11/13/24 07:20, Christoph Hellwig wrote:
> Hi Jens,
>
> currently blk-mq reorders requests when adding them to the plug because
> the request list can't do efficient tail appends. When the plug is
> directly issued using ->queue_rqs that means reordered requests are
> passed to the driver, which can lead to very bad I/O patterns when
> not corrected, especially on rotational devices (e.g. NVMe HDD) or
> when using zone append.
>
> This series first adds two easily backportable workarounds to reverse
> the reording in the virtio_blk and nvme-pci ->queue_rq implementations
> similar to what the non-queue_rqs path does, and then adds a rq_list
> type that allows for efficient tail insertions and uses that to fix
> the reordering for real and then does the same for I/O completions as
> well.
Looks good to me. I ran the quick performance numbers [1].
Reviewed-by: Chaitanya Kulkarni <kch at nvidia.com>
-ck
fio randread iouring workload :-
IOPS :-
-------
nvme-orig: Average IOPS: 72,690
nvme-new-no-reorder: Average IOPS: 72,580
BW :-
-------
nvme-orig: Average BW: 283.9 MiB/s
nvme-new-no-reorder: Average BW: 283.4 MiB/s
IOPS/BW :-
nvme-orig-10.fio: read: IOPS=72.9k, BW=285MiB/s
(299MB/s)(16.7GiB/60004msec)
nvme-orig-1.fio: read: IOPS=72.7k, BW=284MiB/s (298MB/s)(16.6GiB/60004msec)
nvme-orig-2.fio: read: IOPS=73.0k, BW=285MiB/s (299MB/s)(16.7GiB/60004msec)
nvme-orig-3.fio: read: IOPS=73.3k, BW=286MiB/s (300MB/s)(16.8GiB/60003msec)
nvme-orig-4.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60003msec)
nvme-orig-5.fio: read: IOPS=72.4k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec)
nvme-orig-6.fio: read: IOPS=72.9k, BW=285MiB/s (299MB/s)(16.7GiB/60003msec)
nvme-orig-7.fio: read: IOPS=72.3k, BW=282MiB/s (296MB/s)(16.5GiB/60004msec)
nvme-orig-8.fio: read: IOPS=72.4k, BW=283MiB/s (296MB/s)(16.6GiB/60003msec)
nvme-orig-9.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec)
nvme (nvme-6.13) #
nvme (nvme-6.13) # grep BW nvme-new-no-reorder-*fio
nvme-new-no-reorder-10.fio: read: IOPS=72.5k, BW=283MiB/s
(297MB/s)(16.6GiB/60004msec)
nvme-new-no-reorder-1.fio: read: IOPS=72.5k, BW=283MiB/s
(297MB/s)(16.6GiB/60004msec)
nvme-new-no-reorder-2.fio: read: IOPS=72.5k, BW=283MiB/s
(297MB/s)(16.6GiB/60003msec)
nvme-new-no-reorder-3.fio: read: IOPS=71.7k, BW=280MiB/s
(294MB/s)(16.4GiB/60005msec)
nvme-new-no-reorder-4.fio: read: IOPS=72.5k, BW=283MiB/s
(297MB/s)(16.6GiB/60004msec)
nvme-new-no-reorder-5.fio: read: IOPS=72.6k, BW=284MiB/s
(298MB/s)(16.6GiB/60003msec)
nvme-new-no-reorder-6.fio: read: IOPS=73.3k, BW=286MiB/s
(300MB/s)(16.8GiB/60003msec)
nvme-new-no-reorder-7.fio: read: IOPS=72.8k, BW=284MiB/s
(298MB/s)(16.7GiB/60003msec)
nvme-new-no-reorder-8.fio: read: IOPS=73.2k, BW=286MiB/s
(300MB/s)(16.7GiB/60004msec)
nvme-new-no-reorder-9.fio: read: IOPS=72.2k, BW=282MiB/s
(296MB/s)(16.5GiB/60005msec)
More information about the Linux-nvme
mailing list