[PATCH] iosched: Add i10 I/O Scheduler
Rachit Agarwal
rach4x0r at gmail.com
Mon Jan 11 13:15:28 EST 2021
[Resending the last message] Happy 2021 everyone!
> Dear all:
>
> Hope you are all well.
>
> Sagi and I were wondering if you have any additional feedback on the
> updated patch? (@Ming?) We have been receiving a lot of
> interest/questions from industry on incorporation of i10 in the
> kernel. If people do not have additional feedback, it would be nice to
> move this forward.
>
> Looking forward to hearing from you!
> ~Rachit
>
> On Sat, Nov 28, 2020 at 12:49 AM Rachit Agarwal <rach4x0r at gmail.com> wrote:
> >
> >
> >
> > On Fri, Nov 13, 2020 at 4:56 PM Sagi Grimberg <sagi at grimberg.me> wrote:
> >>
> >>
> >> >>>> But if you think this has a better home, I'm assuming that the guys
> >> >>>> will be open to that.
> >> >>>
> >> >>> Also see the reply from Ming. It's a balancing act - don't want to add
> >> >>> extra overhead to the core, but also don't want to carry an extra
> >> >>> scheduler if the main change is really just variable dispatch batching.
> >> >>> And since we already have a notion of that, seems worthwhile to explore
> >> >>> that venue.
> >> >>
> >> >> I agree,
> >> >>
> >> >> The main difference is that this balancing is not driven from device
> >> >> resource pressure, but rather from an assumption of device specific
> >> >> optimization (and also with a specific optimization target), hence a
> >> >> scheduler a user would need to opt-in seemed like a good compromise.
> >> >>
> >> >> But maybe Ming has some good ideas on a different way to add it..
> >> >
> >> > So here's another case - virtualized nvme. The commit overhead is
> >> > suitably large there that performance suffers quite a bit, similarly to
> >> > your remote storage case. If we had suitable logic in the core, then we
> >> > could easily propagate this knowledge when setting up the queue. Then it
> >> > could happen automatically, without needing a configuration to switch to
> >> > a specific scheduler.
> >>
> >> Yes, these use-cases share characteristics. I'm not at all opposed to
> >> placing this in the core. I do think that in order to put something like
> >> this in the core, the bar needs to be higher such that an optimization
> >> target cannot be biased towards a workload (i.e. needs to be adaptive).
> >>
> >> I'm still not sure how we would build this on top of what we already
> >> have as it is really centered around device being busy (which is not
> >> the case for nvme), but I didn't put enough thought into it yet.
> >>
> >
> > Dear all:
> >
> > Thanks, again, for the very constructive decisions.
> >
> > I am writing back with quite a few updates:
> >
> > 1. We have now included a detailed comparison of i10 scheduler with Kyber with NVMe-over-TCP (https://github.com/i10-kernel/upstream-linux/blob/master/i10-evaluation.pdf). In a nutshell, when operating with NVMe-over-TCP, i10 demonstrates the core tradeoff: higher latency, but also higher throughput. This seems to be the core tradeoff exposed by i10.
> >
> > 2. We have now implemented an adaptive version of i10 I/O scheduler, that uses the number of outstanding requests at the time of batch dispatch (and whether the dispatch was triggered by timeout or not) to adaptively set the batching size. The new results (https://github.com/i10-kernel/upstream-linux/blob/master/i10-evaluation.pdf) show that i10-adaptive further improves performance for low loads, while keeping the performance for high loads. IMO, there is much to do on designing improved adaptation algorithms.
> >
> > 3. We have now updated the i10-evaluation document to include results for local storage access. The core take-away here is that i10-adaptive can achieve similar throughput and latency at low loads and at high loads when compared to noop, but still requires more work for lower loads. However, given that the tradeoff exposed by i10 scheduler is particularly useful for remote storage devices (and as Jens suggested, perhaps for virtualized local storage access), I do agree with Sagi -- I think we should consider including it in the core, since this may be useful for a broad range of new use cases.
> >
> > We have also created a second version of the patch that includes these updates: https://github.com/i10-kernel/upstream-linux/blob/master/0002-iosched-Add-i10-I-O-Scheduler.patch
> >
> > As always, thank you for the constructive discussion and I look forward to working with you on this.
> >
> > Best,
> > ~Rachit
> >
More information about the Linux-nvme
mailing list