[LSF/MM/BPF TOPIC] Towards more useful nvme passthrough
Luis Chamberlain
mcgrof at kernel.org
Wed Mar 2 15:40:43 PST 2022
On Mon, Feb 28, 2022 at 02:55:11PM +0530, Kanchan Joshi wrote:
> I'd like a propose a session to go over:
>
> - What are the issues in having the above work (uring-cmd and new nvme
> passthru) merged?
It sounds like we just needed to settle on the formats. And a few more
eyeballs / reviewed-by's. No? And it sounds like Jens is about to punt
a new series :)
> - What would be other useful things to add in nvme-passthru. For
> example- lack of vectored-io for passthru was one such missing piece.
> That is covered from nvme 5.18 onwards [4]. But are there other things
> that user-space would need before it starts treating this path as a
> good alternative to kernel-bypass?
I think it would be good to split this into two parts:
* io-uring cmd extensions
* what can be extended for nvme
io-uring cmd is not even upstream yet, so I don't think folks widely really
realize the potential yet. So I think it's a bit too early to tell here,
and so we should go out and preach at things like Plumbers and other
conferences with a few nice demos of what can be done. nvme being one
use case, but I think it would help to get other users active and not
just vaporware.
The problem I'm seeing with this effort too is it relies too heavily
on the nvme passthrough being the only use case so far, and that's
a bit too involved. So I'd like to encourage other simple users
to consider helping here.
Granted this is like looking for a nail when you're hammer. And so
the only way to not have it be that way is to aim smaller, a simple
real demo of something useful. I don't know.. I'd think something like
trinity might have a field day with this.
> - Despite the numbers above, nvme passthru has more room for
> efficiency e.g. unlike regular io, we do copy_to_user to fetch
> command, and put_user to return the result. Eliminating some of this
> may require new ioctl. There may be other opinions on what else needs
> overhaul in this path.
I think we are being to hard on ourselves. Start small, and, get some
basic stuff up. And allow for flexibility for improvement. I think
at this point we have more than proof of concept no but something
tangible?
> - What would be a good way to upstream the tests? Nvme-cli may not be
> very useful. Should it be similar to fio’s sg ioengine. But
> unlike sg, here we are combining ng with io_uring, and one would want
> to retain all the tunables of io_uring (register/fixed buffers/sqpoll
> etc.)
If the goal was to help open the door for unsupported commands then
in so far as upstream is concerned shouldn't we only care about the
generic plumbing? ie, specific commands / which might not yet be
baked for general consumption (like zone append) are left to up to
implementors to figure out where they test. Let's use zone append
as an example. Without a raw block interface to it, we can use this
framework, ideally.. but yeah how do we test? Are vendors all going
to agree to use microbenches with io-uring cmd?
> - All the above is for 2.0 passthru which essentially forms a direct
> path between io_uring and nvme. And io_uring and nvme programming
> model share many similarities. For 3.0 passthru, would it be crazy to
> think of trimming the path further by eliminating the block-layer and
> doing stuff without “struct request”. There is some interest in
> developing user-space block device [5] and FS anyway.
I failed to capture where 2.0 and 3.0 are defined. Can you elaborate?
Luis
More information about the Linux-nvme
mailing list