[PATCH V2 0/2] nvme: Support for fused NVME_IOCTL_SUBMIT_IO
Clay Mayers
Clay.Mayers at kioxia.com
Tue Jan 26 16:14:16 EST 2021
> From: Chaitanya Kulkarni <Chaitanya.Kulkarni at wdc.com>
> Sent: Tuesday, January 26, 2021 11:01 AM
>
> On 1/26/21 10:17 AM, Clay Mayers wrote:
> >>
> >> On 1/25/21 12:03, clay.mayers at kioxia.com wrote:
> >>> Local pci device fused support is also necessary for NVMeOF targets
> >>> to support fused operation.
> >> Please explain the use case and the application of the NVMeOF fuse
> >> command feature.
> > NVMeOF devices are used to create disaggregated storage systems where
> > compute and storage are connected over a fabric. Fused compare/write
> > can be used to arbitrate shared access to NVMeOF devices w/o a central
> > authority.
> >
> > A specific example of how fused compare/write is used is the clustered
> > file system VMFS. It uses the SCSI version of compare/write to manage
> > meta data on shared SAN systems. File system meta data is updated
> > using locks stored on the storage media. Those locks are grabbed
> > using fused compare/write operations as an atomic test & set. VMFS
> > originally used device reserve, which is a courser gained locking
> > mechanism but it doesn't scale as well as an atomic test & set.
> If I understand correctly VMFS is out of tree filesystem is it ?
I seem to have misunderstood your request for a use-case. As a patch series,
this is not about NVMeOF. This is about pci support for the fused command.
NVMeOF is the use case for pci fused support.
But how strong of a use case is NVMeOF? I offered clustered file systems
and the public example of VMWare's VMFS to illustrate the usefulness.
Here VMWare is the target and Linux is the host serving up storage over
NVMeOF. That requires fused support the target/host and pci. With a
past company I worked for, we use the SPDK to get this functionality for
disaggregated storage. That's right for some solutions but not all.
Our actual goal is to have something like direct device access without
something like the SPDK. We think io uring is the correct solution.
Jens, just before his winter PTO, tweeted about adding ioctl support to
io uring. We hope to extend that to support fused operations as well.
Exposing it through IOCTL makes the pci patch useful now. The one
Example I have is for nvme-cli as requested on github.
https://github.com/linux-nvme/nvme-cli/issues/318
I thought this was better than folding an nvme change in with an io uring
patch series. I'm trying to find the balance between a small isolated unit of
change and something compelling.
> Can you please explain the setup in detail ? what kind of interface file-system
> is using to issue the command ?
> Based on your description it looks like target is connected to the vmware
> based system and host is vmware based host and not the linux host which is
> present in this series.
No - the idea is to be standards based and use NVMeOF for target and host
data exchange. In one example, the target would be running vSphere. The
host, as a Linux machine, would expose its attached devices with NVMeOF.
VSphere would expect fused command support from the Linux machine.
> Also what are other applications or is this the only one application?
The application is disaggregated storage on NVMeOF, both consuming it
and publishing it. I don't have any specific set of applications to offer.
More information about the Linux-nvme
mailing list