[LSF/MM/BPF ATTEND][LSF/MM/BPF Topic] Non-block IO

Kanchan Joshi joshiiitr at gmail.com
Tue Apr 11 16:28:41 PDT 2023


On Wed, Apr 12, 2023 at 4:23 AM Jens Axboe <axboe at kernel.dk> wrote:
>
> On 4/11/23 4:48 PM, Kanchan Joshi wrote:
> >>> 4. Direct NVMe queues - will there be interest in having io_uring
> >>> managed NVMe queues?  Sort of a new ring, for which I/O is destaged from
> >>> io_uring SQE to NVMe SQE without having to go through intermediate
> >>> constructs (i.e., bio/request). Hopefully,that can further amp up the
> >>> efficiency of IO.
> >>
> >> This is interesting, and I've pondered something like that before too. I
> >> think it's worth investigating and hacking up a prototype. I recently
> >> had one user of IOPOLL assume that setting up a ring with IOPOLL would
> >> automatically create a polled queue on the driver side and that is what
> >> would be used for IO. And while that's not how it currently works, it
> >> definitely does make sense and we could make some things faster like
> >> that. It would also potentially easier enable cancelation referenced in
> >> #1 above, if it's restricted to the queue(s) that the ring "owns".
> >
> > So I am looking at prototyping it, exclusively for the polled-io case.
> > And for that, is there already a way to ensure that there are no
> > concurrent submissions to this ring (set with IORING_SETUP_IOPOLL
> > flag)?
> > That will be the case generally (and submissions happen under
> > uring_lock mutex), but submission may still get punted to io-wq
> > worker(s) which do not take that mutex.
> > So the original task and worker may get into doing concurrent submissions.
>
> io-wq may indeed get in your way. But I think for something like this,
> you'd never want to punt to io-wq to begin with. If userspace is managing
> the queue, then by definition you cannot run out of tags.

Unfortunately we have lifetime differences between io_uring and NVMe.
NVMe tag remains valid/occupied until completion (we do not have a
nice sq->head to look at and decide).
For io_uring, it can be reused much earlier i.e. just after submission.
So tag shortage is possible.

>If there are
> other conditions for this kind of request that may run into out-of-memory
> conditions, then the error just needs to be returned.

I see, and IOSQE_ASYNC can also be flagged as an error/not-supported. Thanks.



More information about the Linux-nvme mailing list