[LSF/MM/BPF Topic] Towards more useful nvme-passthrough

Luis Chamberlain mcgrof at kernel.org
Wed Mar 2 16:45:37 PST 2022


On Thu, Jun 24, 2021 at 11:24:27AM +0200, Hannes Reinecke wrote:
> On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> > Background & objectives:
> > ------------------------
> > 
> > The NVMe passthrough interface
> > 
> > Good part: allows new device-features to be usable (at least in raw
> > form) without having to build block-generic cmds, in-kernel users,
> > emulations and file-generic user-interfaces - all this take some time to
> > evolve.
> > 
> > Bad part: passthrough interface has remain tied to synchronous ioctl,
> > which is a blocker for performance-centric usage scenarios. User-space
> > can take the pain of implementing async-over-sync on its own but it does
> > not make much sense in a world that already has io_uring.
> > 
> > Passthrough is lean in the sense it cuts through layers of abstractions
> > and reaches to NVMe fast. One of the objective here is to build a
> > scalable pass-through that can be readily used to play with new/emerging
> > NVMe features.  Another is to surpass/match existing raw/direct block
> > I/O performance with this new in-kernel path.
> > 
> > Recent developments:
> > --------------------
> > - NVMe now has a per-namespace char interface that remains available/usable
> >   even for unsupported features and for new command-sets [1].
> > 
> > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
> >   introduces new possibilities (beyond storage); async-passthrough is one of
> > those. Last posted version is V4 [2].
> > 
> > - I have posted work on async nvme passthrough over block-dev [3]. Posted work
> >   is in V4 (in sync with the infra of [2]).
> > 
> > Early performance numbers:
> > --------------------------
> > fio, randread, 4k bs, 1 job
> > Kiops, with varying QD:
> > 
> > QD      Sync-PT         io_uring        Async-PT
> > 1         10.8            10.6            10.6
> > 2         10.9            24.5            24
> > 4         10.6            45              46
> > 8         10.9            90              89
> > 16        11.0            169             170
> > 32        10.6            308             307
> > 64        10.8            503             506
> > 128       10.9            592             596
> > 
> > Further steps/discussion points:
> > --------------------------------
> > 1.Async-passthrough over nvme char-dev
> > It is in a shape to receive feedback, but I am not sure if community
> > would like to take a look at that before settling on uring-cmd infra.
> > 
> > 2.Once above gets in shape, bring other perf-centric features of io_uring to
> > this path -
> > A. SQPoll and register-file: already functional.
> > B. Passthrough polling: This can be enabled for block and looks feasible for
> > char-interface as well.  Keith recently posted enabling polling for user
> > pass-through [4]
> > C. Pre-mapped buffers: Early thought is to let the buffers registered by
> > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> > everything that passthrough does except pinning/unpinning the pages.
> > 
> > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> > which can be optimized.
> > 
> > Ideally I'd like to cover good deal of ground before Dec. But there seems
> > plenty of possibilities on this path.  Discussion would help in how best to
> > move forward, and cement the ideas.
> > 
> > [1] https://lore.kernel.org/linux-nvme/20210421074504.57750-1-minwoo.im.dev@gmail.com/
> > [2] https://lore.kernel.org/linux-nvme/20210317221027.366780-1-axboe@kernel.dk/
> > [3] https://lore.kernel.org/linux-nvme/20210325170540.59619-1-joshi.k@samsung.com/
> > [4] https://lore.kernel.org/linux-block/20210517171443.GB2709391@dhcp-10-100-145-180.wdc.com/#t
> > 
> I do like the idea.
> 
> What I would like to see is to make the ioring_cmd infrastructure
> generally available, such that we can port the SCSI sg asynchronous
> interface over to this.

What prevents you from doing this already? I think we just need more
patch reviews for the generic io-uring cmd patches, no?

 Luis



More information about the Linux-nvme mailing list