[LSF/MM/BPF ATTEND][LSF/MM/BPF Topic] Non-block IO
Jens Axboe
axboe at kernel.dk
Tue Apr 11 19:12:44 PDT 2023
On 4/11/23 5:28?PM, Kanchan Joshi wrote:
> On Wed, Apr 12, 2023 at 4:23?AM Jens Axboe <axboe at kernel.dk> wrote:
>>
>> On 4/11/23 4:48?PM, Kanchan Joshi wrote:
>>>>> 4. Direct NVMe queues - will there be interest in having io_uring
>>>>> managed NVMe queues? Sort of a new ring, for which I/O is destaged from
>>>>> io_uring SQE to NVMe SQE without having to go through intermediate
>>>>> constructs (i.e., bio/request). Hopefully,that can further amp up the
>>>>> efficiency of IO.
>>>>
>>>> This is interesting, and I've pondered something like that before too. I
>>>> think it's worth investigating and hacking up a prototype. I recently
>>>> had one user of IOPOLL assume that setting up a ring with IOPOLL would
>>>> automatically create a polled queue on the driver side and that is what
>>>> would be used for IO. And while that's not how it currently works, it
>>>> definitely does make sense and we could make some things faster like
>>>> that. It would also potentially easier enable cancelation referenced in
>>>> #1 above, if it's restricted to the queue(s) that the ring "owns".
>>>
>>> So I am looking at prototyping it, exclusively for the polled-io case.
>>> And for that, is there already a way to ensure that there are no
>>> concurrent submissions to this ring (set with IORING_SETUP_IOPOLL
>>> flag)?
>>> That will be the case generally (and submissions happen under
>>> uring_lock mutex), but submission may still get punted to io-wq
>>> worker(s) which do not take that mutex.
>>> So the original task and worker may get into doing concurrent submissions.
>>
>> io-wq may indeed get in your way. But I think for something like this,
>> you'd never want to punt to io-wq to begin with. If userspace is managing
>> the queue, then by definition you cannot run out of tags.
>
> Unfortunately we have lifetime differences between io_uring and NVMe.
> NVMe tag remains valid/occupied until completion (we do not have a
> nice sq->head to look at and decide).
> For io_uring, it can be reused much earlier i.e. just after submission.
> So tag shortage is possible.
The sqe cannot be the tag, the tag has to be generated separately. It
doesn't make sense to tie the sqe and tag together, as one is consumed
in order and the other one is not.
>> If there are
>> other conditions for this kind of request that may run into out-of-memory
>> conditions, then the error just needs to be returned.
>
> I see, and IOSQE_ASYNC can also be flagged as an error/not-supported.
Yep!
--
Jens Axboe
More information about the Linux-nvme
mailing list