[LSF/MM/BPF Topic] Towards more useful nvme-passthrough
Luis Chamberlain
mcgrof at kernel.org
Wed Mar 2 16:45:37 PST 2022
On Thu, Jun 24, 2021 at 11:24:27AM +0200, Hannes Reinecke wrote:
> On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> > Background & objectives:
> > ------------------------
> >
> > The NVMe passthrough interface
> >
> > Good part: allows new device-features to be usable (at least in raw
> > form) without having to build block-generic cmds, in-kernel users,
> > emulations and file-generic user-interfaces - all this take some time to
> > evolve.
> >
> > Bad part: passthrough interface has remain tied to synchronous ioctl,
> > which is a blocker for performance-centric usage scenarios. User-space
> > can take the pain of implementing async-over-sync on its own but it does
> > not make much sense in a world that already has io_uring.
> >
> > Passthrough is lean in the sense it cuts through layers of abstractions
> > and reaches to NVMe fast. One of the objective here is to build a
> > scalable pass-through that can be readily used to play with new/emerging
> > NVMe features. Another is to surpass/match existing raw/direct block
> > I/O performance with this new in-kernel path.
> >
> > Recent developments:
> > --------------------
> > - NVMe now has a per-namespace char interface that remains available/usable
> > even for unsupported features and for new command-sets [1].
> >
> > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
> > introduces new possibilities (beyond storage); async-passthrough is one of
> > those. Last posted version is V4 [2].
> >
> > - I have posted work on async nvme passthrough over block-dev [3]. Posted work
> > is in V4 (in sync with the infra of [2]).
> >
> > Early performance numbers:
> > --------------------------
> > fio, randread, 4k bs, 1 job
> > Kiops, with varying QD:
> >
> > QD Sync-PT io_uring Async-PT
> > 1 10.8 10.6 10.6
> > 2 10.9 24.5 24
> > 4 10.6 45 46
> > 8 10.9 90 89
> > 16 11.0 169 170
> > 32 10.6 308 307
> > 64 10.8 503 506
> > 128 10.9 592 596
> >
> > Further steps/discussion points:
> > --------------------------------
> > 1.Async-passthrough over nvme char-dev
> > It is in a shape to receive feedback, but I am not sure if community
> > would like to take a look at that before settling on uring-cmd infra.
> >
> > 2.Once above gets in shape, bring other perf-centric features of io_uring to
> > this path -
> > A. SQPoll and register-file: already functional.
> > B. Passthrough polling: This can be enabled for block and looks feasible for
> > char-interface as well. Keith recently posted enabling polling for user
> > pass-through [4]
> > C. Pre-mapped buffers: Early thought is to let the buffers registered by
> > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> > everything that passthrough does except pinning/unpinning the pages.
> >
> > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> > which can be optimized.
> >
> > Ideally I'd like to cover good deal of ground before Dec. But there seems
> > plenty of possibilities on this path. Discussion would help in how best to
> > move forward, and cement the ideas.
> >
> > [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
> > [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
> > [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
> > [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t
> >
> I do like the idea.
>
> What I would like to see is to make the ioring_cmd infrastructure
> generally available, such that we can port the SCSI sg asynchronous
> interface over to this.
What prevents you from doing this already? I think we just need more
patch reviews for the generic io-uring cmd patches, no?
Luis
More information about the Linux-nvme
mailing list