[RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device

Dong, Eddie eddie.dong at intel.com
Tue Dec 6 10:00:27 PST 2022



> -----Original Message-----
> From: Christoph Hellwig <hch at lst.de>
> Sent: Tuesday, December 6, 2022 7:36 AM
> To: Jason Gunthorpe <jgg at ziepe.ca>
> Cc: Christoph Hellwig <hch at lst.de>; Rao, Lei <Lei.Rao at intel.com>;
> kbusch at kernel.org; axboe at fb.com; kch at nvidia.com; sagi at grimberg.me;
> alex.williamson at redhat.com; cohuck at redhat.com; yishaih at nvidia.com;
> shameerali.kolothum.thodi at huawei.com; Tian, Kevin <kevin.tian at intel.com>;
> mjrosato at linux.ibm.com; linux-kernel at vger.kernel.org; linux-
> nvme at lists.infradead.org; kvm at vger.kernel.org; Dong, Eddie
> <eddie.dong at intel.com>; Li, Yadong <yadong.li at intel.com>; Liu, Yi L
> <yi.l.liu at intel.com>; Wilk, Konrad <konrad.wilk at oracle.com>;
> stephen at eideticom.com; Yuan, Hang <hang.yuan at intel.com>
> Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device
> 
> On Tue, Dec 06, 2022 at 11:28:12AM -0400, Jason Gunthorpe wrote:
> > I'm interested as well, my mental model goes as far as mlx5 and
> > hisillicon, so if nvme prevents the VFs from being contained units, it
> > is a really big deviation from VFIO's migration design..
> 
> In NVMe the controller (which maps to a PCIe physical or virtual
> function) is unfortunately not very self contained.  A lot of state is subsystem-
> wide, where the subsystem is, roughly speaking, the container for all
> controllers that shared storage.  That is the right thing to do for say dual
> ported SSDs that are used for clustering or multi-pathing, for tentant isolation
> is it about as wrong as it gets.


NVMe spec is general, but the implementation details (such as internal state) may 
be vendor specific. If the migration happens between 2 identical NVMe devices 
(from same vendor/device w/ same firmware version), migration of 
subsystem-wide state can be naturally covered, right?

> 
> There is nothing in the NVMe spec that prohibits your from implementing
> multiple subsystems for multiple functions of a PCIe device, but if you do that
> there is absolutely no support in the spec to manage shared resources or any
> other interaction between them.

In IPU/DPU area, it seems multiple VFs with SR-IOV is widely adopted.

In VFs, the usage of shared resource can be viewed as implementation specific, 
and load/save state of a VF can rely on the hardware/firmware itself.
Migration of NVMe devices crossing vendor/device is another story: it may
be useful, but brings additional challenges. 


More information about the Linux-nvme mailing list