[RFC PATCH 0/2] virtio nvme

Nicholas A. Bellinger nab at linux-iscsi.org
Wed Sep 16 23:10:41 PDT 2015


Hi Ming & Co,

On Thu, 2015-09-10 at 10:28 -0700, Ming Lin wrote:
> On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote:
> > > These 2 patches added virtio-nvme to kernel and qemu,
> > > basically modified from virtio-blk and nvme code.
> > >
> > > As title said, request for your comments.

<SNIP>

> > 
> > At first glance it seems like the virtio_nvme guest driver is just
> > another block driver like virtio_blk, so I'm not clear why a
> > virtio-nvme device makes sense.
> 
> I think the future "LIO NVMe target" only speaks NVMe protocol.
> 
> Nick(CCed), could you correct me if I'm wrong?
> 
> For SCSI stack, we have:
> virtio-scsi(guest)
> tcm_vhost(or vhost_scsi, host)
> LIO-scsi-target
> 
> For NVMe stack, we'll have similar components:
> virtio-nvme(guest)
> vhost_nvme(host)
> LIO-NVMe-target
> 

I think it's more interesting to consider a 'vhost style' driver that
can be used with unmodified nvme host OS drivers.

Dr. Hannes (CC'ed) had done something like this for megasas a few years
back using specialized QEMU emulation + eventfd based LIO fabric driver,
and got it working with Linux + MSFT guests.

Doing something similar for nvme would (potentially) be on par with
current virtio-scsi+vhost-scsi small-block performance for scsi-mq
guests, without the extra burden of a new command set specific virtio
driver.

> > 
> > > Now there are lots of duplicated code with linux/nvme-core.c and qemu/nvme.c.
> > > The ideal result is to have a multi level NVMe stack(similar as SCSI).
> > > So we can re-use the nvme code, for example
> > >
> > >                         .-------------------------.
> > >                         | NVMe device register    |
> > >   Upper level           | NVMe protocol process   |
> > >                         |                         |
> > >                         '-------------------------'
> > >
> > >
> > >
> > >               .-----------.    .-----------.    .------------------.
> > >  Lower level  |   PCIe    |    | VIRTIO    |    |NVMe over Fabrics |
> > >               |           |    |           |    |initiator         |
> > >               '-----------'    '-----------'    '------------------'
> > 
> > You mentioned LIO and SCSI.  How will NVMe over Fabrics be integrated
> > into LIO?  If it is mapped to SCSI then using virtio_scsi in the guest
> > and tcm_vhost should work.
> 
> I think it's not mapped to SCSI.
> 
> Nick, would you share more here?
> 

(Adding Dave M. CC')

So NVMe target code needs to function in at least two different modes:

- Direct mapping of nvme backend driver provided hw queues to nvme
  fabric driver provided hw queues.

- Decoding of NVMe command set for basic Read/Write/Flush I/O for 
  submission to existing backend drivers (eg: iblock, fileio, rd_mcp)

With the former case, it's safe to assumes there to be anywhere from a
very small amount of code involved, to no code involved for fast-path
operation.

For more involved logic like PR, ALUA, and EXTENDED_COPY, I think both
modes will still mostly likely handle some aspects of this in software,
and not entirely behind a backend nvme host hw interface.

--nab




More information about the Linux-nvme mailing list