[RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device.

Kanchan Joshi joshiiitr at gmail.com
Wed Sep 22 00:19:52 PDT 2021


I am sorry for taking longer than I should have.

On Wed, Sep 8, 2021 at 11:45 AM Christoph Hellwig <hch at lst.de> wrote:
>
> On Tue, Sep 07, 2021 at 09:50:27PM +0530, Kanchan Joshi wrote:
> > > A few other notes:
> > >
> > >  - I suspect the ioctl_cmd really should move into the core using_cmd
> > >    infrastructure
> >
> > Yes, that seems possible by creating that field outside by combining
> > "op" and "unused" below.
> > +struct io_uring_cmd {
> > + struct file *file;
> > + __u16 op;
> > + __u16 unused;
> > + __u32 len;
> > + __u64 pdu[5]; /* 40 bytes available inline for free use */
> > +};
>
> Two different issues here:
>
>  - the idea of having a two layer indirection with op and a cmd doesn't
>    really make much sense
>  - if we want to avoid conflicts using 32-bit probably makes sense
>
> So I'd turn op and unused into a single cmd field, use the ioctl encoding
> macros for it (but preferably pick different numbers than the existing
> ioctls).

I was thinking along the same lines, except the "picking different
numbers than existing ioctls" part.
Does that mean adding a new IOCTL for each operation which requires
async transport?

> > >  - that whole mix of user space interface and internal data in the
> > >    ->pdu field is a mess.  What is the problem with deferring the
> > >    request freeing into the user context, which would clean up
> > >    quite a bit of that, especially if io_uring_cmd grows a private
> > >    field.
> >
> > That mix isn't great but the attempt was to save the allocation.
> > And I was not very sure if it'd be fine to defer freeing the request
> > until task-work fires up.
>
> What would be the problem with the delaying?

When we free the request, the tag is also freed, and that may enable
someone else to pump more IO.
Pushing freeing of requests to user-context seemed like delaying that part.
If you think that is a misplaced concern, I can change.
The changed structure will look like this -

struct nvme_uring_cmd {
       __u32   ioctl_cmd;
       __u32   unused1;
       void __user *argp;
      union {
                struct bio *bio;
                struct request *req;
             };
       void *meta;
};
cmd->bio will be freed in nvme-completion while cmd->req will be freed
in user context.
Since we have the request intact, we will not store "u64 result; int
status;" anymore and overall there will be a reduction of 4 bytes in
size of nvme_uring_cmd.
Would you prefer this way?

> > Even if we take that route, we would still need a place to store bio
> > pointer (hopefully meta pointer can be extracted out of bio).
> > Do you see it differently?
>
> We don't need the bio pointer at all.  The old passthrough code needed
> it when we still used block layer bonuce buffering for it.  But that
> bounce buffering for passthrough commands got removed a while ago,
> and even before nvme never used it.

nvme_submit_user_cmd() calls blk_rq_map_user(), which sets up req->bio
(and that is regardless of bounce buffering I suppose).
For sync-ioctl, this bio pointer is locally stored and that is used to
free the bio post completion.
For async-ioctl too, we need some place to store this.
So I could not connect this to bounce buffering (alone). Am I missing
your point?

One of the way could be to change blk_update_request() to avoid
setting req->bio to NULL. But perhaps that may invite more troubles,
and we are not saving anything: bio-pointer is inside the union anyway
(in above struct nvme_uring_cmd).


-- 
Kanchan



More information about the Linux-nvme mailing list