[RFC] generic NVMe target and NVMe loop driver

Ming Lin mlin at kernel.org
Mon Nov 16 21:53:59 PST 2015


On Mon, 2015-11-16 at 00:29 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2015-11-16 at 00:08 -0800, Ming Lin wrote:
> > On Sun, 2015-11-15 at 23:30 -0800, Nicholas A. Bellinger wrote:
> > > On Sat, 2015-11-07 at 18:00 +0100, Christoph Hellwig wrote:
> > > > This series continues the NVMe host drive split and also starts adding a
> > > > consume for it.  The consumer is mostly interesting for developers at this
> > > > point as it's simply a 'loopback' NVMe device that ties the split NVMe
> > > > driver fronted into the new generic NVMe target subsystem.
> > > 
> > > Very excited to see this code posted.  8-)
> > > 
> > > > This was developed for our NVMe over Fabrics prototype, but will also be useful for
> > > > other work like Ming's virtio-nvme or event an implementation traditional
> > > > PCIe NVMe using vhost.
> > > > 
> > > 
> > > Wrt to vhost-nvme, the WIP code (Dr. Hannes + Dave CC'ed) I'm currently
> > > hacking on is here:
> > > 
> > > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/log/?h=vhost-nvme-wip
> > > 
> > > Note it's still a week or two away (using rts-megasas as a reference)
> > > from actually functioning across a modest number of queue resources, but
> > > should at least give interested folks an idea of how things look so far.
> > 
> > Hi Nic,
> > 
> > FYI,
> > 
> > I have done the vhost-nvme patches(based on our previous discussion) on
> > top of NVMe target.
> > 
> > I'll post kernel & qemu patches early this week.
> > 
> 
> Great.  Looking forward to seeing the prototype code.
> 
> > But the tests I have done so far didn't show competitive performance
> > compared with vhost-scsi. Maybe because the mmio thing is slow.
> > 
> 
> Anything interesting hot-spots that show up in perf output..?

To ease development, I use nested kvm.
"vm_host" runs on bare metal and "vm_guest" runs on "vm_host"

I just integrated Google's extension to vhost-nvme.
https://github.com/rlnelson-git/linux-nvme.git

It's amazing performance improves a lot. 
I use a 256M /dev/ram0 on vm_host as backend.

fio 4k read:
qemu-nvme: ~20M to ~30M
qemu-vhost-nvme + google ext: 80M to 200M(not very stable though)

(BTW, still waiting for employer's approval to send out patches)

   PerfTop:    1039 irqs/sec  kernel:99.8%  exact:  0.0% [4000Hz cpu-clock],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------
    36.93%  [kernel]       [k] _raw_spin_unlock_irq          
    20.98%  [kernel]       [k] vmx_handle_external_intr      
    10.10%  [kernel]       [k] _raw_spin_unlock_irqrestore   
     4.95%  [kernel]       [k] __mutex_unlock_slowpath       
     4.41%  [kernel]       [k] lock_acquire                  
     4.15%  [kernel]       [k] lock_is_held                  
     2.30%  [kernel]       [k] mutex_lock_nested             
     1.68%  [kernel]       [k] lock_release                  
     1.14%  [kernel]       [k] put_compound_page             
     0.93%  [kernel]       [k] debug_lockdep_rcu_enabled     
     0.66%  [kernel]       [k] check_preemption_disabled     
     0.64%  [kernel]       [k] __schedule                    
     0.62%  [kernel]       [k] lock_acquired                 
     0.54%  [kernel]       [k] rcu_lockdep_current_cpu_online
     0.54%  [kernel]       [k] preempt_count_sub             
     0.54%  [kernel]       [k] preempt_count_add             
     0.46%  [kernel]       [k] find_vma                      
     0.45%  [kernel]       [k] vmcs_writel                   
     0.40%  [kernel]       [k] ___might_sleep                
     0.38%  [kernel]       [k] rcu_note_context_switch       
     0.37%  [kernel]       [k] rcu_read_lock_sched_held      
     0.32%  [kernel]       [k] __rcu_is_watching             
     0.32%  [kernel]       [k] follow_trans_huge_pmd         
     0.31%  [kernel]       [k] debug_smp_processor_id        
     0.22%  [kernel]       [k] follow_page_mask              
     0.18%  [kernel]       [k] __get_user_pages              
     0.16%  [kernel]       [k] vmx_read_guest_seg_ar         
     0.16%  [kernel]       [k] nvmet_vhost_rw                
     0.15%  [kernel]       [k] kthread_should_stop           
     0.14%  [kernel]       [k] schedule                      
     0.14%  [kernel]       [k] rcu_is_watching               
     0.12%  [kernel]       [k] nvmet_vhost_sq_thread         
     0.11%  [kernel]       [k] get_parent_ip                 
     0.11%  [kernel]       [k] _raw_spin_lock_irqsave   




More information about the Linux-nvme mailing list