[PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
Pavel Begunkov
asml.silence at gmail.com
Thu Aug 25 02:34:11 PDT 2022
On 8/22/22 12:33, Kanchan Joshi wrote:
> On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote:
[...]
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index 1463cfecb56b..80ea35d1ed5c 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -203,6 +203,7 @@ enum io_uring_op {
>>> IORING_OP_SOCKET,
>>> IORING_OP_URING_CMD,
>>> IORING_OP_SENDZC_NOTIF,
>>> + IORING_OP_URING_CMD_FIXED,
>>
>> I don't think it should be another opcode, is there any
>> control flags we can fit it in?
>
> using sqe->rw_flags could be another way.
We also use ->ioprio for io_uring opcode specific flags,
e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST,
might be even better better.
> But I think that may create bit of disharmony in user-space.
> Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as
> IORING_OP_READ/WRITE_FIXED.
And I still believe it was a bad choice, I don't like this encoding
of independent options/features by linearising toggles into opcodes.
A consistent way to add vectored fixed bufs would be to have a 4th
opcode, e.g. READV_FIXED, which is not great.
> User-space uses new opcode, and sends the
> buffer by filling sqe->buf_index. So must we take a different way?
I do think so
>>> /* this goes last, obviously */
>>> IORING_OP_LAST,
>>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>> index 9a0df19306fe..7d5731b84c92 100644
>>> --- a/io_uring/opdef.c
>>> +++ b/io_uring/opdef.c
>>> @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
>>> .issue = io_uring_cmd,
>>> .prep_async = io_uring_cmd_prep_async,
>>> },
>>> + [IORING_OP_URING_CMD_FIXED] = {
>>> + .needs_file = 1,
>>> + .plug = 1,
>>> + .name = "URING_CMD_FIXED",
>>> + .iopoll = 1,
>>> + .async_size = uring_cmd_pdu_size(1),
>>> + .prep = io_uring_cmd_prep,
>>> + .issue = io_uring_cmd,
>>> + .prep_async = io_uring_cmd_prep_async,
>>> + },
>>> [IORING_OP_SENDZC_NOTIF] = {
>>> .name = "SENDZC_NOTIF",
>>> .needs_file = 1,
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 1a4fb8a44b9a..3c7b94bffa62 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
>>> if (READ_ONCE(req->iopoll_completed))
>>> break;
>>> - if (req->opcode == IORING_OP_URING_CMD) {
>>> + if (req->opcode == IORING_OP_URING_CMD ||
>>> + req->opcode == IORING_OP_URING_CMD_FIXED) {
>>
>> I don't see the changed chunk upstream
>
> Right, it is on top of iopoll support (plus one more series mentioned in
> covered letter). Here is the link - https://lore.kernel.org/linux-block/20220807183607.352351-1-joshi.k@samsung.com/
> It would be great if you could review that.
>
--
Pavel Begunkov
More information about the Linux-nvme
mailing list