[PATCH 17/17] nvme: enable non-inline passthru commands

Kanchan Joshi joshiiitr at gmail.com
Thu Mar 10 10:43:24 PST 2022


On Thu, Mar 10, 2022 at 7:49 PM Christoph Hellwig <hch at lst.de> wrote:
>
> On Thu, Mar 10, 2022 at 05:20:13PM +0530, Kanchan Joshi wrote:
> > In sync ioctl, we always update this result field by doing put_user on
> > completion.
> > For async ioctl, since command is inside the the sqe, its lifetime is
> > only upto submission. SQE may get reused post submission, leaving no
> > way to update the "result" field on completion. Had this field been a
> > pointer, we could have saved this on submission and updated on
> > completion. But that would require redesigning this structure and
> > adding newer ioctl in nvme.
>
> Why would it required adding an ioctl to nvme?  The whole io_uring
> async_cmd infrastructure is completely independent from ioctls.

io_uring is sure not peeking into ioctl and its command-structure but
offering the facility to use its sqe to store that ioctl-command
inline.
Problem is, the inline facility does not go very well with this
particular nvme-passthru ioctl (NVME_IOCTL_IO64_CMD).
And that's because this ioctl requires additional "__u64 result;" to
be updated within "struct nvme_passthru_cmd64".
To update that during completion, we need, at the least, the result
field to be a pointer "__u64 result_ptr" inside the struct
nvme_passthru_cmd64.
Do you see that is possible without adding a new passthru ioctl in nvme?

> > Coming back, even though sync-ioctl alway updates this result to
> > user-space, only a few nvme io commands (e.g. zone-append, copy,
> > zone-mgmt-send) can return this additional result (spec-wise).
> > Therefore in nvme, when we are dealing with inline-sqe commands from
> > io_uring, we never attempt to update the result. And since we don't
> > update the result, we limit support to only read/write passthru
> > commands. And fail any other command during submission itself (Patch
> > 2).
>
> Yikes.  That is outright horrible.  passthrough needs to be command
> agnostic and future proof to any newly added nvme command.

This patch (along with patch 16) does exactly that. Makes it
command-agnostic and future-proof. All nvme-commands will work with
it.
Just that application needs to pass the pointer of ioctl-command and
not place it inline inside the sqe.

Overall, I think at io_uring infra level both submission makes sense:
big-sqe based inline submission (more efficient for <= 80 bytes) and
normal-sqe based non-inline/indirect submissions.
At nvme-level, we have to pick (depending on ioctl in hand). Currently
we are playing with both and constructing a sort of fast-path (for all
commands) and another faster-path (only for read/write commands).
Should we (at nvme-level) rather opt out and use only indirect
(because it works for all commands) or must we build a way to enable
inline-one for all commands?

> > > Overly long line.
> >
> > Under 100, but sure, can fold it under 80.
>
> You can only use 100 sparingly if it makes the code more readable.  Which
> I know is fuzzy, and in practice never does.  Certainly not in nvme and
> block code.

Clears up, thanks.

-- 
Kanchan



More information about the Linux-nvme mailing list