[LSF/MM/BPF TOPIC] Improving Zoned Storage Support

Bart Van Assche bvanassche at acm.org
Tue Jan 16 17:21:49 PST 2024


On 1/16/24 15:34, Damien Le Moal wrote:
> On 1/17/24 03:20, Bart Van Assche wrote:
>> File system implementers have to decide whether to use Write or Zone
>> Append. While the Zone Append command tolerates reordering, with this
>> command the filesystem cannot control the order in which the data is
>> written on the medium without restricting the queue depth to one.
>> Additionally, the latency of write operations is lower compared to zone
>> append operations. From [2], a paper with performance results for one
>> ZNS SSD model: "we observe that the latency of write operations is lower
>> than that of append operations, even if the request size is the same".
> 
> What is the queue depth for this claim ?

Hmm ... I haven't found this in the paper. Maybe I overlooked something.

>> The mq-deadline I/O scheduler serializes zoned writes even if these got
>> reordered by the block layer. However, the mq-deadline I/O scheduler,
>> just like any other single-queue I/O scheduler, is a performance
>> bottleneck for SSDs that support more than 200 K IOPS. Current NVMe and
>> UFS 4.0 block devices support more than 200 K IOPS.
> 
> FYI, I am about to post 20-something patches that completely remove zone write
> locking and replace it with "zone write plugging". That is done above the IO
> scheduler and also provides zone append emulation for drives that ask for it.
> 
> With this change:
>   - Zone append emulation is moved to the block layer, as a generic
> implementation. sd and dm zone append emulation code is removed.
>   - Any scheduler can be used, including "none". mq-deadline zone block device
> special support is removed.
>   - Overall, a lot less code (the series removes more code than it adds).
>   - Reordering problems such as due to IO priority is resolved as well.
> 
> This will need a lot of testing, which we are working on. But your help with
> testing on UFS devices will be appreciated as well.

That sounds very interesting. I can help with reviewing the kernel
patches and also with testing these.

Thanks,

Bart.




More information about the Linux-nvme mailing list