[RFC PATCH 0/9] vhost-nvme: new qemu nvme backend using nvme target

Mon Nov 23 23:27:54 PST 2015

On Mon, 2015-11-23 at 15:14 +0100, Paolo Bonzini wrote:
> 
> On 23/11/2015 09:17, Ming Lin wrote:
> > On Sat, 2015-11-21 at 14:11 +0100, Paolo Bonzini wrote:
> >>
> >> On 20/11/2015 01:20, Ming Lin wrote:
> >>> One improvment could be to use google's NVMe vendor extension that
> >>> I send in another thread, aslo here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=nvme-google-ext
> >>>
> >>> Qemu side:
> >>> http://www.minggr.net/cgit/cgit.cgi/qemu/log/?h=vhost-nvme.0
> >>> Kernel side also here:
> >>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=vhost-nvme.0
> >>
> >> How much do you get with vhost-nvme plus vendor extension, compared to
> >> 190 MB/s for QEMU?
> > 
> > There is still some bug. I'll update.
> 
> Sure.
> 
> >> Note that in all likelihood, QEMU can actually do better than 190 MB/s,
> >> and gain more parallelism too, by moving the processing of the
> >> ioeventfds to a separate thread.  This is similar to
> >> hw/block/dataplane/virtio-blk.c.
> >>
> >> It's actually pretty easy to do.  Even though
> >> hw/block/dataplane/virtio-blk.c is still using some old APIs, all memory
> >> access in QEMU is now thread-safe.  I have pending patches for 2.6 that
> >> cut that file down to a mere 200 lines of code, NVMe would probably be
> >> about the same.
> > 
> > Is there a git tree for your patches?
> 
> No, not yet.  I'll post them today or tomorrow, will make sure to Cc you.
> 
> > Did you mean some pseduo code as below?
> > 1. need a iothread for each cq/sq?
> > 2. need a AioContext for each cq/sq?
> > 
> >  hw/block/nvme.c | 32 ++++++++++++++++++++++++++++++--
> >  hw/block/nvme.h |  8 ++++++++
> >  2 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index f27fd35..fed4827 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -28,6 +28,8 @@
> >  #include "sysemu/sysemu.h"
> >  #include "qapi/visitor.h"
> >  #include "sysemu/block-backend.h"
> > +#include "sysemu/iothread.h"
> > +#include "qom/object_interfaces.h"
> >  
> >  #include "nvme.h"
> >  
> > @@ -558,9 +560,22 @@ static void nvme_init_cq_eventfd(NvmeCQueue *cq)
> >      uint16_t offset = (cq->cqid*2+1) * (4 << NVME_CAP_DSTRD(n->bar.cap));
> >  
> >      event_notifier_init(&cq->notifier, 0);
> > -    event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
> >      memory_region_add_eventfd(&n->iomem,
> >          0x1000 + offset, 4, false, 0, &cq->notifier);
> > +
> > +    object_initialize(&cq->internal_iothread_obj,
> > +                      sizeof(cq->internal_iothread_obj),
> > +                      TYPE_IOTHREAD);
> > +    user_creatable_complete(OBJECT(&cq->internal_iothread_obj), &error_abort);
> 
> For now, you have to use one iothread for all cq/sq of a single NVMe
> device; multiqueue block layer is planned for 2.7 or 2.8.  Otherwise
> yes, it's very close to just these changes.

Here is the call stack of iothread for virtio-blk-dataplane.

handle_notify (qemu/hw/block/dataplane/virtio-blk.c:126)
aio_dispatch (qemu/aio-posix.c:329)
aio_poll (qemu/aio-posix.c:474)
iothread_run (qemu/iothread.c:45)
start_thread (pthread_create.c:312)
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)

I think I'll have a "nvme_dev_notify" similar as "handle_notify"

static void nvme_dev_notify(EventNotifier *e)
{
    ....
}

But then how can I know this notify is for cq or sq?