[PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd

Pavel Begunkov asml.silence at gmail.com
Thu Aug 25 02:34:11 PDT 2022


On 8/22/22 12:33, Kanchan Joshi wrote:
> On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote:
[...]
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index 1463cfecb56b..80ea35d1ed5c 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -203,6 +203,7 @@ enum io_uring_op {
>>>      IORING_OP_SOCKET,
>>>      IORING_OP_URING_CMD,
>>>      IORING_OP_SENDZC_NOTIF,
>>> +    IORING_OP_URING_CMD_FIXED,
>>
>> I don't think it should be another opcode, is there any
>> control flags we can fit it in?
> 
> using sqe->rw_flags could be another way.

We also use ->ioprio for io_uring opcode specific flags,
e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST,
might be even better better.

> But I think that may create bit of disharmony in user-space.
> Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as
> IORING_OP_READ/WRITE_FIXED.

And I still believe it was a bad choice, I don't like this encoding
of independent options/features by linearising toggles into opcodes.
A consistent way to add vectored fixed bufs would be to have a 4th
opcode, e.g. READV_FIXED, which is not great.

> User-space uses new opcode, and sends the
> buffer by filling sqe->buf_index. So must we take a different way?

I do think so


>>>      /* this goes last, obviously */
>>>      IORING_OP_LAST,
>>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>> index 9a0df19306fe..7d5731b84c92 100644
>>> --- a/io_uring/opdef.c
>>> +++ b/io_uring/opdef.c
>>> @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
>>>          .issue            = io_uring_cmd,
>>>          .prep_async        = io_uring_cmd_prep_async,
>>>      },
>>> +    [IORING_OP_URING_CMD_FIXED] = {
>>> +        .needs_file        = 1,
>>> +        .plug            = 1,
>>> +        .name            = "URING_CMD_FIXED",
>>> +        .iopoll            = 1,
>>> +        .async_size        = uring_cmd_pdu_size(1),
>>> +        .prep            = io_uring_cmd_prep,
>>> +        .issue            = io_uring_cmd,
>>> +        .prep_async        = io_uring_cmd_prep_async,
>>> +    },
>>>      [IORING_OP_SENDZC_NOTIF] = {
>>>          .name            = "SENDZC_NOTIF",
>>>          .needs_file        = 1,
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 1a4fb8a44b9a..3c7b94bffa62 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
>>>          if (READ_ONCE(req->iopoll_completed))
>>>              break;
>>> -        if (req->opcode == IORING_OP_URING_CMD) {
>>> +        if (req->opcode == IORING_OP_URING_CMD ||
>>> +                req->opcode == IORING_OP_URING_CMD_FIXED) {
>>
>> I don't see the changed chunk upstream
> 
> Right, it is on top of iopoll support (plus one more series mentioned in
> covered letter). Here is the link - https://lore.kernel.org/linux-block/20220807183607.352351-1-joshi.k@samsung.com/
> It would be great if you could review that.
> 

-- 
Pavel Begunkov



More information about the Linux-nvme mailing list