[PATCHv2 0/2] block,nvme: latency-based I/O scheduler
Sagi Grimberg
sagi at grimberg.me
Sun Apr 7 12:55:43 PDT 2024
On 05/04/2024 18:36, Hannes Reinecke wrote:
> On 4/5/24 17:03, Keith Busch wrote:
>> On Fri, Apr 05, 2024 at 08:21:14AM +0200, Hannes Reinecke wrote:
>>> On 4/4/24 23:14, Keith Busch wrote:
>>>> On Wed, Apr 03, 2024 at 04:17:54PM +0200, Hannes Reinecke wrote:
>>>>> Hi all,
>>>>>
>>>>> there had been several attempts to implement a latency-based I/O
>>>>> scheduler for native nvme multipath, all of which had its issues.
>>>>>
>>>>> So time to start afresh, this time using the QoS framework
>>>>> already present in the block layer.
>>>>> It consists of two parts:
>>>>> - a new 'blk-nlatency' QoS module, which is just a simple per-node
>>>>> latency tracker
>>>>> - a 'latency' nvme I/O policy
>>>> Whatever happened with the io-depth based path selector? That should
>>>> naturally align with the lower latency path, and that metric is
>>>> cheaper
>>>> to track.
>>>
>>> Turns out that tracking queue depth (on the NVMe level) always requires
>>> an atomic, and with that a performance impact.
>>> The qos/blk-stat framework is already present, and as the numbers show
>>> actually leads to a performance improvement.
>>>
>>> So I'm not quite sure what the argument 'cheaper to track' buys us
>>> here...
>>
>> I was considering the blk_stat framework compared to those atomic
>> operations. I usually don't enable stats because all the extra
>> ktime_get_ns() and indirect calls are relatively costly. If you're
>> enabling stats anyway though, then yeah, I guess I don't really have a
>> point and your idea here seems pretty reasonable.
>
> Pretty much. Of course you need stats to be enabled.
> And problem with the queue depth is that it's actually quite costly
> to compute; the while sbitmap thingie is precisely there to _avoid_
> having to track the queue depth.
> I can't really see how one could track the queue depth efficiently;
> the beauty of the blk_stat framework is that it's running async, and
> only calculated after I/O is completed.
> We could do a 'mock' queue depth by calculating the difference between
> submitted and completed I/O, but even then you'd have to inject a call
> in the hot path to track the number of submissions.
>
> In the end, the latency tracker did what I wanted to achieve (namely
> balance out uneven paths), _and_ got faster than round-robin, so I
> didn't care about queue depth tracking.
Hey Hannes,
I think its a fair claim that a latency tracker is a valid proxy for
io-depth tracker.
I think that we need Ewan to validate if this solves the original issue
he was trying
to solve with his io-depth mpath selector. If so, I don't see any major
issues with this
proposal.
More information about the Linux-nvme
mailing list