[RFC PATCH 0/2] virtio nvme
Nicholas A. Bellinger
nab at linux-iscsi.org
Sat Sep 26 22:01:56 PDT 2015
On Wed, 2015-09-23 at 15:58 -0700, Ming Lin wrote:
> On Fri, 2015-09-18 at 14:09 -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote:
> > > On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote:
<SNIP>
> > IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with
> > LBA + length based on SGL memory or pass along a FLUSH with LBA +
> > length.
> >
> > So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host
> > hardware frame via eventfd, it would decode the frame and send along the
> > Read/Write/Flush when exposing existing (non nvme native) backend
> > drivers.
>
> Learned vhost architecture:
> http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html
>
> The nice thing is it is not tied to KVM in any way.
>
Yes.
There are assumptions vhost currently makes about the guest using virtio
queues however, and at least for an initial vhost_nvme prototype it's
probably easier to avoid hacking up drivers/vhost/* (for now)..
(Adding MST CC')
> For SCSI, there are "virtio-scsi" in guest kernel and "vhost-scsi" in
> host kernel.
>
> For NVMe, there is no "virtio-nvme" in guest kernel(just unmodified NVMe
> driver), but I'll do similar thing in Qemu with vhost infrastructure.
> And there is "vhost_nvme" in host kernel.
>
> For the "virtqueue" implementation in qemu-nvme, I'll possibly just
> use/copy drivers/virtio/virtio_ring.c, same as what
> linux/tools/virtio/virtio_test.c does.
>
> A bit more detail graph as below. What do you think?
>
> .-----------------------------------------. .------------------------.
> | Guest(Linux, Windows, FreeBSD, Solaris) | NVMe | qemu |
> | unmodified NVMe driver | command | NVMe device emulation |
> | | -------> | vhost + virtqueue |
> '-----------------------------------------' '------------------------'
> | | ^
> passthrough | kick/notify
> NVMe command | via eventfd
> userspace via virtqueue | | |
> v v |
> ----------------------------------------------------------------------------------
This should read something like:
Passthrough of nvme hardware frames via QEMU PCI-e struct vhost_mem into
a custom vhost_nvme kernel driver ioctl using struct file + struct
eventfd_ctx primitives.
Eg: QEMU user-space is not performing the nvme command decode before
passing emulated nvme hardware frame up to host kernel driver.
> .-----------------------------------------------------------------------.
> kernel | LIO frontend driver |
> | - vhost_nvme |
> '-----------------------------------------------------------------------'
> | translate ^
> | (NVMe command) |
> | to |
> v (LBA, length) |
vhost_nvme is performing host kernel level decode of user-space provided
nvme hardware frames into nvme command + LBA +length + SGL buffer for
target backend driver submission
> .----------------------------------------------------------------------.
> | LIO backend driver |
> | - fileio (/mnt/xxx.file) |
> | - iblock (/dev/sda1, /dev/nvme0n1, ...) |
> '----------------------------------------------------------------------'
> | ^
> | submit_bio() |
> v |
> .----------------------------------------------------------------------.
> | block layer |
> | |
> '----------------------------------------------------------------------'
For this part, HCH mentioned he is currently working on some code to
pass native NVMe commands + SGL memory via blk-mq struct request into
struct nvme_dev and/or struct nvme_queue.
> | ^
> | |
> v |
> .----------------------------------------------------------------------.
> | block device driver |
> | |
> '----------------------------------------------------------------------'
> | | | |
> | | | |
> v v v v
> .------------. .-----------. .------------. .---------------.
> | SATA | | SCSI | | NVMe | | .... |
> '------------' '-----------' '------------' '---------------'
>
>
Looks fine.
Btw, after chatting with Dr. Hannes this week at SDC here are his
original rts-megasas -v6 patches from Feb 2013.
Note they are standalone patches that require a sufficiently old enough
LIO + QEMU to actually build + function.
https://github.com/Datera/rts-megasas/blob/master/rts_megasas-qemu-v6.patch
https://github.com/Datera/rts-megasas/blob/master/rts_megasas-fabric-v6.patch
For groking purposes, they demonstrate the principle design for a host
kernel level driver, along with the megasas firmware interface (MFI)
specific emulation magic that makes up the bulk of the code.
Take a look.
--nab
More information about the Linux-nvme
mailing list