[PATCH v6 00/28] Zone write plugging
Dennis Maisenbacher
dennis.maisenbacher at wdc.com
Fri Apr 5 03:40:39 PDT 2024
On 05/04/2024 06.45, Damien Le Moal wrote:
> Performance evaluation results
> ==============================
>
> Environments:
> - Intel Xeon 16-cores/32-threads, 128GB of RAM
> - Kernel:
> - ZWL (baseline): block/for-next (based on 6.9.0-rc2)
> - ZWP: block/for-next patched kernel to add zone write plugging
> (both kernels were compiled with the same configuration turning
> off most heavy debug features)
>
> Workoads:
> - seqw4K1: 4KB sequential write, qd=1
> - seqw4K16: 4KB sequential write, qd=16
> - seqw1M16: 1MB sequential write, qd=16
> - rndw4K16: 4KB random write, qd=16
> - rndw128K16: 128KB random write, qd=16
> - btrfs workoad: Single fio job writing 128 MB files using 128 KB
> direct IOs at qd=16.
>
> Devices:
> - nullblk (zoned): 4096 zones of 256 MB, 128 max open zones.
> - NVMe ZNS drive: 1 TB ZNS drive with 2GB zone size, 14 max open and
> active zones.
> - SMR HDD: 20 TB disk with 256MB zone size, 128 max open zones.
>
> For ZWP, the result show the performance percentage increase (or
> decrease) against ZWL (baseline) case.
>
> 1) null_blk zoned device:
>
> +--------+--------+-------+--------+---------+---------+
> |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
> |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWL | 940 | 840 | 18550 | 14400 | 424 | 167 |
> |mq-deadline| | | | | | |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 943 | 845 | 18660 | 14770 | 461 | 165 |
> |mq-deadline| (+0%) | (+0%) | (+0%) | (+1%) | (+8%) | (-1%) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 756 | 668 | 16020 | 12980 | 135 | 101 |
> | bfq | (-19%) | (-20%) | (-13%)| (-9%) | (-68%) | (-39%) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 2639 | 1715 | 28190 | 19760 | 344 | 150 |
> | none | (+180%)| (+104%)| (+51%)| (+37%) | (-18%) | (-10%) |
> +-----------+--------+--------+-------+--------+--------+----------+
>
> ZWP with mq-deadline gives performance very similar to zone write
> locking, showing that zone write plugging overhead is acceptable.
> But ZWP ability to run fast block devices with the none scheduler
> shows brings all the benefits of zone write plugging and results in
> significant performance increase for all workloads. The exception to
> this are random write workloads with multiple jobs: for these, the
> faster request submission rate achieved by zone write plugging results
> in higher contention on null-blk zone spinlock, which degrades
> performance.
>
> 2) NVMe ZNS drive:
>
> +--------+--------+-------+--------+--------+----------+
> |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
> |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWL | 183 | 702 | 1066 | 1103 | 53.5 | 14.5 |
> |mq-deadline| | | | | | |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 183 | 719 | 1086 | 1108 | 55.6 | 14.7 |
> |mq-deadline| (-0%) | (+1%) | (+0%) | (+0%) | (+3%) | (+1%) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 178 | 691 | 1082 | 1106 | 30.8 | 11.5 |
> | bfq | (-3%) | (-2%) | (-0%) | (+0%) | (-42%) | (-20%) |
> +-----------+--------+--------+-------+--------+--------+----------+
> | ZWP | 190 | 666 | 1083 | 1108 | 51.4 | 14.7 |
> | none | (+4%) | (-5%) | (+0%) | (+0%) | (-4%) | (+0%) |
> +-----------+--------+--------+-------+--------+--------+----------+
>
> Zone write plugging overhead does not significantly impact performance.
> Similar to nullblk, using the none scheduler leads to performance
> increase for most workloads.
>
> 3) SMR SATA HDD:
>
> +-------+--------+-------+--------+--------+----------+
> |seqw4K1|seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
> |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) |
> +-----------+-------+--------+-------+--------+--------+----------+
> | ZWL | 107 | 243 | 245 | 246 | 2.2 | 0.763 |
> |mq-deadline| | | | | | |
> +-----------+-------+--------+-------+--------+--------+----------+
> | ZWP | 107 | 242 | 245 | 245 | 2.2 | 0.772 |
> |mq-deadline| (+0%) | (-0%) | (+0%) | (-0%) | (+0%) | (+0%) |
> +-----------+-------+--------+-------+--------+--------+----------+
> | ZWP | 104 | 241 | 246 | 242 | 2.2 | 0.765 |
> | bfq | (-2%) | (-0%) | (+0%) | (-0%) | (+0%) | (+0%) |
> +-----------+-------+--------+-------+--------+--------+----------+
> | ZWP | 115 | 235 | 249 | 242 | 2.2 | 0.763 |
> | none | (+7%) | (-3%) | (+1%) | (-1%) | (+0%) | (+0%) |
> +-----------+-------+--------+-------+--------+--------+----------+
>
> Performance with purely sequential write workloads at high queue depth
> somewhat decrease a little when using zone write plugging. This is due
> to the different IO pattern that ZWP generates where the first writes to
> a zone start being issued when the end of the previous zone are still
> being written. Depending on how the disk handles queued commands, seek
> may be generated, slightly impacting the throughput achieved. Such pure
> sequential write workloads are however rare with SMR drives.
>
> 4) Zone append tests using btrfs:
>
> +-------------+-------------+-----------+-------------+
> | null-blk | null_blk | ZNS | SMR |
> | native ZA | emulated ZA | native ZA | emulated ZA |
> | (MB/s) | (MB/s) | (MB/s) | (MB/s) |
> +-----------+-------------+-------------+-----------+-------------+
> | ZWL | 2441 | N/A | 1081 | 243 |
> |mq-deadline| | | | |
> +-----------+-------------+-------------+-----------+-------------+
> | ZWP | 2361 | 2999 | 1085 | 239 |
> |mq-deadline| (-1%) | | (+0%) | (-2%) |
> +-----------+-------------+-------------+-----------+-------------+
> | ZWP | 2299 | 2730 | 1080 | 240 |
> | bfq | (-4%) | | (+0%) | (-2%) |
> +-----------+-------------+-------------+-----------+-------------+
> | ZWP | 2443 | 3152 | 1083 | 240 |
> | none | (+0%) | | (+0%) | (-1%) |
> +-----------+-------------+-------------+-----------+-------------+
>
> With a more realistic use of the device though a file system, ZWP does
> not introduce significant performance differences, except for SMR for
> the same reason as with the fio sequential workloads at high queue
> depth.
>
I ran some fio performance tests across multiple different NVMe ZNS
devices on my bare metal setup with this patch set.
In my tests I ran seqw, seqr and rndr with a range of block sizes and
varying concurrent jobs for both the none and mq-deadline scheduler.
The results are consistent with the ones you posted here. Performance
improvements are most noticeable for rndr workloads.
Looks great!
Dennis
Tested-by: Dennis Maisenbacher <dennis.maisenbacher at wdc.com>
More information about the Linux-nvme
mailing list