[PATCH v6 00/28] Zone write plugging

Dennis Maisenbacher dennis.maisenbacher at wdc.com
Fri Apr 5 03:40:39 PDT 2024


On 05/04/2024 06.45, Damien Le Moal wrote:
> Performance evaluation results
> ==============================
> 
> Environments:
>  - Intel Xeon 16-cores/32-threads, 128GB of RAM
>  - Kernel:
>    - ZWL (baseline): block/for-next (based on 6.9.0-rc2)
>    - ZWP: block/for-next patched kernel to add zone write plugging
>      (both kernels were compiled with the same configuration turning
>      off most heavy debug features)
> 
> Workoads:
>  - seqw4K1: 4KB sequential write, qd=1
>  - seqw4K16: 4KB sequential write, qd=16
>  - seqw1M16: 1MB sequential write, qd=16
>  - rndw4K16: 4KB random write, qd=16
>  - rndw128K16: 128KB random write, qd=16
>  - btrfs workoad: Single fio job writing 128 MB files using 128 KB
>    direct IOs at qd=16.
> 
> Devices:
>  - nullblk (zoned): 4096 zones of 256 MB, 128 max open zones.
>  - NVMe ZNS drive: 1 TB ZNS drive with 2GB zone size, 14 max open and
>    active zones.
>  - SMR HDD: 20 TB disk with 256MB zone size, 128 max open zones.
> 
> For ZWP, the result show the performance percentage increase (or
> decrease) against ZWL (baseline) case.
> 
> 1) null_blk zoned device:
> 
>              +--------+--------+-------+--------+---------+---------+
>              |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
>              |(MB/s)  | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS)  |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWL    | 940    | 840    | 18550 | 14400  | 424    | 167      |
>  |mq-deadline|        |        |       |        |        |          |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 943    | 845    | 18660 | 14770  | 461    | 165      |
>  |mq-deadline| (+0%)  | (+0%)  | (+0%) | (+1%)  | (+8%)  | (-1%)    |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 756    | 668    | 16020 | 12980  | 135    | 101      |
>  |    bfq    | (-19%) | (-20%) | (-13%)| (-9%)  | (-68%) | (-39%)   |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 2639   | 1715   | 28190 | 19760  | 344    | 150      |
>  |   none    | (+180%)| (+104%)| (+51%)| (+37%) | (-18%) | (-10%)   |
>  +-----------+--------+--------+-------+--------+--------+----------+
> 
> ZWP with mq-deadline gives performance very similar to zone write
> locking, showing that zone write plugging overhead is acceptable.
> But ZWP ability to run fast block devices with the none scheduler
> shows brings all the benefits of zone write plugging and results in
> significant performance increase for all workloads. The exception to
> this are random write workloads with multiple jobs: for these, the
> faster request submission rate achieved by zone write plugging results
> in higher contention on null-blk zone spinlock, which degrades
> performance.
> 
> 2) NVMe ZNS drive:
> 
>              +--------+--------+-------+--------+--------+----------+
>              |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
>              |(MB/s)  | (MB/s) |(MB/s) | (MB/s) | (KIOPS)|  (KIOPS) |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWL    | 183    | 702    | 1066  | 1103   | 53.5   | 14.5     |
>  |mq-deadline|        |        |       |        |        |          |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 183    | 719    | 1086  | 1108   | 55.6   | 14.7     |
>  |mq-deadline| (-0%)  | (+1%)  | (+0%) | (+0%)  | (+3%)  | (+1%)    |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 178    | 691    | 1082  | 1106   | 30.8   | 11.5     |
>  |    bfq    | (-3%)  | (-2%)  | (-0%) | (+0%)  | (-42%) | (-20%)   |
>  +-----------+--------+--------+-------+--------+--------+----------+
>  |    ZWP    | 190    | 666    | 1083  | 1108   | 51.4   | 14.7     |
>  |   none    | (+4%)  | (-5%)  | (+0%) | (+0%)  | (-4%)  | (+0%)    |
>  +-----------+--------+--------+-------+--------+--------+----------+
> 
> Zone write plugging overhead does not significantly impact performance.
> Similar to nullblk, using the none scheduler leads to performance
> increase for most workloads.
> 
> 3) SMR SATA HDD:
> 
>              +-------+--------+-------+--------+--------+----------+
>              |seqw4K1|seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16|
>              |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)|  (KIOPS) |
>  +-----------+-------+--------+-------+--------+--------+----------+
>  |    ZWL    | 107   | 243    | 245   | 246    | 2.2    | 0.763    |
>  |mq-deadline|       |        |       |        |        |          |
>  +-----------+-------+--------+-------+--------+--------+----------+
>  |    ZWP    | 107   | 242    | 245   | 245    | 2.2    | 0.772    |
>  |mq-deadline| (+0%) | (-0%)  | (+0%) | (-0%)  | (+0%)  | (+0%)    |
>  +-----------+-------+--------+-------+--------+--------+----------+
>  |    ZWP    | 104   | 241    | 246   | 242    | 2.2    | 0.765    |
>  |    bfq    | (-2%) | (-0%)  | (+0%) | (-0%)  | (+0%)  | (+0%)    |
>  +-----------+-------+--------+-------+--------+--------+----------+
>  |    ZWP    | 115   | 235    | 249   | 242    | 2.2    | 0.763    |
>  |   none    | (+7%) | (-3%)  | (+1%) | (-1%)  | (+0%)  | (+0%)    |
>  +-----------+-------+--------+-------+--------+--------+----------+
> 
> Performance with purely sequential write workloads at high queue depth
> somewhat decrease a little when using zone write plugging. This is due
> to the different IO pattern that ZWP generates where the first writes to
> a zone start being issued when the end of the previous zone are still
> being written. Depending on how the disk handles queued commands, seek
> may be generated, slightly impacting the throughput achieved. Such pure
> sequential write workloads are however rare with SMR drives.
> 
> 4) Zone append tests using btrfs:
> 
>              +-------------+-------------+-----------+-------------+
>              |  null-blk   |  null_blk   |    ZNS    |     SMR     |
>              |  native ZA  | emulated ZA | native ZA | emulated ZA |
>              |    (MB/s)   |   (MB/s)    |   (MB/s)  |    (MB/s)   |
>  +-----------+-------------+-------------+-----------+-------------+
>  |    ZWL    | 2441        | N/A         | 1081      | 243         |
>  |mq-deadline|             |             |           |             |
>  +-----------+-------------+-------------+-----------+-------------+
>  |    ZWP    | 2361        | 2999        | 1085      | 239         |
>  |mq-deadline| (-1%)       |             | (+0%)     | (-2%)       |
>  +-----------+-------------+-------------+-----------+-------------+
>  |    ZWP    | 2299        | 2730        | 1080      | 240         |
>  |    bfq    | (-4%)       |             | (+0%)     | (-2%)       |
>  +-----------+-------------+-------------+-----------+-------------+
>  |    ZWP    | 2443        | 3152        | 1083      | 240         |
>  |    none   | (+0%)       |             | (+0%)     | (-1%)       |
>  +-----------+-------------+-------------+-----------+-------------+
> 
> With a more realistic use of the device though a file system, ZWP does
> not introduce significant performance differences, except for SMR for
> the same reason as with the fio sequential workloads at high queue
> depth.
> 

I ran some fio performance tests across multiple different NVMe ZNS
devices on my bare metal setup with this patch set.
In my tests I ran seqw, seqr and rndr with a range of block sizes and
varying concurrent jobs for both the none and mq-deadline scheduler.

The results are consistent with the ones you posted here. Performance
improvements are most noticeable for rndr workloads.

Looks great!

Dennis

Tested-by: Dennis Maisenbacher <dennis.maisenbacher at wdc.com>






More information about the Linux-nvme mailing list