[LSF/MM/BPF TOPIC] Improving Zoned Storage Support

Bart Van Assche bvanassche at acm.org
Wed Jan 17 10:22:08 PST 2024


On 1/17/24 09:48, Jens Axboe wrote:
>> When posting this patch series, please include performance results
>> (IOPS) for a zoned null_blk device instance. mq-deadline doesn't support
>> more than 200 K IOPS, which is less than what UFS devices support. I
>> hope that this performance bottleneck will be solved with the new
>> approach.
> 
> Not really zone related, but I was very aware of the single lock
> limitations when I ported deadline to blk-mq. Was always hoping that
> someone would actually take the time to make it more efficient, but so
> far that hasn't happened. Or maybe it'll be a case of "just do it
> yourself, Jens" at some point...

Hi Jens,

I think it is something fundamental rather than something that can be
fixed. The I/O scheduling algorithms in mq-deadline and BFQ require
knowledge of all pending I/O requests. This implies that data structures
must be maintained that are shared across all CPU cores. Making these
thread-safe implies having synchronization mechanisms that are used
across all CPU cores. I think this is where the (about) 200 K IOPS
bottleneck comes from.

Additionally, the faster storage devices become, the larger the relative
overhead of an I/O scheduler is (assuming that I/O schedulers won't
become significantly faster).

A fundamental limitation of I/O schedulers is that multiple commands
must be submitted simultaneously to the storage device to achieve good
performance. However, if the queue depth is larger than one then the
device has some control over the order in which commands are executed.

Because of all the above reasons I'm recommending my colleagues to move
I/O prioritization into the storage device and to evolve towards a
future for solid storage devices without I/O schedulers. I/O schedulers
probably will remain important for rotating magnetic media.

Thank you,

Bart.




More information about the Linux-nvme mailing list