[PATCH v4 1/5] fs,io_uring: add infrastructure for uring-cmd

Jens Axboe axboe at kernel.dk
Tue May 10 07:35:36 PDT 2022


On 5/10/22 8:23 AM, Kanchan Joshi wrote:
> On Thu, May 5, 2022 at 9:47 PM Jens Axboe <axboe at kernel.dk> wrote:
>>
>> On 5/5/22 12:06 AM, Kanchan Joshi wrote:
>>> +static int io_uring_cmd_prep(struct io_kiocb *req,
>>> +                          const struct io_uring_sqe *sqe)
>>> +{
>>> +     struct io_uring_cmd *ioucmd = &req->uring_cmd;
>>> +     struct io_ring_ctx *ctx = req->ctx;
>>> +
>>> +     if (ctx->flags & IORING_SETUP_IOPOLL)
>>> +             return -EOPNOTSUPP;
>>> +     /* do not support uring-cmd without big SQE/CQE */
>>> +     if (!(ctx->flags & IORING_SETUP_SQE128))
>>> +             return -EOPNOTSUPP;
>>> +     if (!(ctx->flags & IORING_SETUP_CQE32))
>>> +             return -EOPNOTSUPP;
>>> +     if (sqe->ioprio || sqe->rw_flags)
>>> +             return -EINVAL;
>>> +     ioucmd->cmd = sqe->cmd;
>>> +     ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
>>> +     return 0;
>>> +}
>>
>> While looking at the other suggested changes, I noticed a more
>> fundamental issue with the passthrough support. For any other command,
>> SQE contents are stable once prep has been done. The above does do that
>> for the basic items, but this case is special as the lower level command
>> itself resides in the SQE.
>>
>> For cases where the command needs deferral, it's problematic. There are
>> two main cases where this can happen:
>>
>> - The issue attempt yields -EAGAIN (we ran out of requests, etc). If you
>>   look at other commands, if they have data that doesn't fit in the
>>   io_kiocb itself, then they need to allocate room for that data and have
>>   it be persistent
> 
> While we have io-wq retrying for this case, async_data is not allocated.
> We need to do that explicitly inside io_uring_cmd(). Something like this -
> 
> if (ret == -EAGAIN) {
> if (!req_has_async_data(req)) {
> if (io_alloc_async_data(req)) return -ENOMEM;
> io_uring_cmd_prep_async(req);
> }
> return ret;
> }
> 
>> - Deferral is specified by the application, using eg IOSQE_IO_LINK or
>>   IOSQE_ASYNC.
> For this to work, we are missing ".needs_async_setup = 1" for
> IORING_OP_URING_CMD.

Agree on both, the op handler itself should alloc the async_data for
this case and that flag does need to be set.

-- 
Jens Axboe




More information about the Linux-nvme mailing list