[RFC 3/5] io_uring: add infra and support for IORING_OP_URING_CMD
Kanchan Joshi
joshiiitr at gmail.com
Mon Apr 4 08:14:20 PDT 2022
On Mon, Apr 4, 2022 at 12:46 PM Christoph Hellwig <hch at lst.de> wrote:
>
> Cann we plese spell out instastructure here? Or did you mean
> infraread anyway :)
:-) sure, I see the problem with this shorthand now.
> > -enum io_uring_cmd_flags {
> > - IO_URING_F_COMPLETE_DEFER = 1,
> > - IO_URING_F_UNLOCKED = 2,
> > - /* int's last bit, sign checks are usually faster than a bit test */
> > - IO_URING_F_NONBLOCK = INT_MIN,
> > -};
>
> This doesn't actually get used anywhere outside of io_uring.c, so why
> move it?
This got missed out.
We are going to use it for things like this in nvme (from past series):
+ if (ioucmd && (ioucmd->flags & IO_URING_F_NONBLOCK)) {
+ rq_flags |= REQ_NOWAIT;
+ blk_flags |= BLK_MQ_REQ_NOWAIT;
+ }
+ req = nvme_alloc_request(q, cmd, blk_flags, rq_flags);
Also for polling too. Will fix this up in the next version.
> > +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked)
> > +{
> > + req->uring_cmd.driver_cb(&req->uring_cmd);
> > +}
> > +
> > +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
> > + void (*driver_cb)(struct io_uring_cmd *))
> > +{
> > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd);
> > +
> > + req->uring_cmd.driver_cb = driver_cb;
> > + req->io_task_work.func = io_uring_cmd_work;
> > + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));
> > +}
> > +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>
> I'm still not a fund of the double indirect call here. I don't really
> have a good idea yet, but I plan to look into it.
>
> > static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
>
> Also it would be great to not add it between io_req_task_queue_fail and
> the callback set by it.
Right. Will change.
> > +void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret)
> > +{
> > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd);
> > +
> > + if (ret < 0)
> > + req_set_fail(req);
> > + io_req_complete(req, ret);
> > +}
> > +EXPORT_SYMBOL_GPL(io_uring_cmd_done);
>
> It seems like all callers of io_req_complete actually call req_set_fail
> on failure. So maybe it would be nice pre-cleanup to handle the
> req_set_fail call from ĩo_req_complete?
io_tee and io_splice do slightly different handling.
If we take this route, it seems they can use __io_req_complete() instead.
> > + /* queued async, consumer will call io_uring_cmd_done() when complete */
> > + if (ret == -EIOCBQUEUED)
> > + return 0;
> > + io_uring_cmd_done(ioucmd, ret);
>
> Why not:
> if (ret != -EIOCBQUEUED)
> io_uring_cmd_done(ioucmd, ret);
> return 0;
>
> That being said I wonder why not just remove the retun value from
> ->async_cmd entirely and just require the implementation to always call
> io_uring_cmd_done? That removes the confusion on who needs to call it
> entirely, similarly to what we do in the block layer for ->submit_bio.
Right, this seems doable at this point as we do not do any special
handling (in io_uring) on the return code.
> > +struct io_uring_cmd {
> > + struct file *file;
> > + void *cmd;
> > + /* for irq-completion - if driver requires doing stuff in task-context*/
> > + void (*driver_cb)(struct io_uring_cmd *cmd);
> > + u32 flags;
> > + u32 cmd_op;
> > + u16 cmd_len;
>
> The cmd_len field does not seem to actually be used anywhere.
Another stuff that got left out from the previous series :-(
Using this field for a bit of sanity checking at the moment. Like this in nvme:
+ if (ioucmd->cmd_len != sizeof(struct nvme_passthru_cmd64))
+ return -EINVAL;
+ cptr = (struct nvme_passthru_cmd64 *)ioucmd->cmd;
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -22,10 +22,12 @@ struct io_uring_sqe {
> > union {
> > __u64 off; /* offset into file */
> > __u64 addr2;
> > + __u32 cmd_op;
> > };
> > union {
> > __u64 addr; /* pointer to buffer or iovecs */
> > __u64 splice_off_in;
> > + __u16 cmd_len;
> > };
> > __u32 len; /* buffer size or number of iovecs */
> > union {
> > @@ -60,7 +62,10 @@ struct io_uring_sqe {
> > __s32 splice_fd_in;
> > __u32 file_index;
> > };
> > - __u64 __pad2[2];
> > + union {
> > + __u64 __pad2[2];
> > + __u64 cmd;
> > + };
>
> Can someone explain these changes to me a little more?
All these three (cmd_op, cmd_len and cmd) are operation specific
fields. user-space supplies these into SQE, io_uring packages those
into "struct io_uring_cmd" and pass that down to the provider for
doing the real processing.
1. cmd_op = operation code for async cmd (e.g. passthru ioctl opcode
or whatever else we want to turn async from user-space)
2. cmd_len = actual length of async command (e.g. we have max 80 bytes
with big-sqe and for nvme-passthru we use the max, but for some other
usecase one can go with smaller length)
3. cmd = this is the starting-offset where async-command is placed (by
user-space) inside the big-sqe. 16 bytes in first-sqe, and next 64
bytes comes from second-sqe.
And if we were doing pointer-based command submission, this "cmd"
would have pointed to user-space command (of whatever length). Just to
put the entire thought-process across; I understand that we are not
taking that route.
More information about the Linux-nvme
mailing list