NVM Express 1.2 - Controller Memory Buffer Functionality.

Stephen Bates Stephen.Bates at pmcs.com
Mon Dec 8 10:44:03 PST 2014


Keith

Fun distractions can be a good thing ;-). Thanks for making that update to QEMU and for sending on your initial driver changes. I cloned your version of the QEMU tree and have it up and running on a local server. Are you OK with my adding some flexibility to the size of the CMB (for testing purposes)?

Also would you mind sending me an example of how you call QEMU when testing NVMe, (there seems to be a lot of QEMU options)? Also is there any open-source code for regression testing of the NVMe driver? I would hate to make some proposed changes only to find I have broken something simple that could have been caught via a simple regression test.

Cheers

Stephen 

-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Keith Busch
Sent: Friday, December 5, 2014 4:29 PM
To: Matias Bjørling
Cc: linux-nvme at lists.infradead.org
Subject: Re: NVM Express 1.2 - Controller Memory Buffer Functionality.

I'm probably going to get yelled at for doing this instead of what I'm supposed to be doing, but sometimes fun distractions are fun!

The QEMU part of CMB is applied in the my tree, as well as a few fixes for other merges I messed up. This is the CMB feature:

http://git.infradead.org/users/kbusch/qemu-nvme.git/commitdiff/aee710c5ce4acb11583b85bc7f1c6ba8bea155d5

I was a bit lazy with it, using an exlusive BAR for controller memory fixed at 128M. I'm also led to believe I'm violating proper MemoryRegion usage by reading "private" values, but I don't see how else to do it!

Here's an qemu example parameters to set up your device for CMB:

   -drive file=<nvme.img>,if=none,id=foo -device nvme,drive=foo,serial=baz,cmb=1

I did have to write some driver bits to test (copied below), but again, I was lazy and didn't do it the "right" way. Everything's hard-coded to match the hard-coded values on the controller side. The only CMB use below is allocating the Admin SQ and CQ out of the CMB. This is definitely going to be slower on QEMU, so don't even try to do performance comparisons. :)

---
diff -ur /drivers/block/nvme-core.c /drivers/block/nvme-core.c
--- /drivers/block/nvme-core.c	2014-12-05 15:28:53.662943237 -0700
+++ /drivers/block/nvme-core.c	2014-12-05 15:41:15.760944823 -0700
@@ -1154,10 +1154,12 @@
  	}
  	spin_unlock_irq(&nvmeq->q_lock);

-	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
+	if (nvmeq->qid || !nvmeq->dev->ctrl_mem) {
+		dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
  				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
+		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
  					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+	}
  	kfree(nvmeq);
  }

@@ -1209,16 +1211,23 @@
  	if (!nvmeq)
  		return NULL;

-	nvmeq->cqes = dma_alloc_coherent(dmadev, CQ_SIZE(depth),
-					&nvmeq->cq_dma_addr, GFP_KERNEL);
-	if (!nvmeq->cqes)
-		goto free_nvmeq;
-	memset((void *)nvmeq->cqes, 0, CQ_SIZE(depth));
+	if (qid || !dev->ctrl_mem) {
+		nvmeq->cqes = dma_alloc_coherent(dmadev, CQ_SIZE(depth),
+				&nvmeq->cq_dma_addr, GFP_KERNEL);
+		if (!nvmeq->cqes)
+			goto free_nvmeq;

-	nvmeq->sq_cmds = dma_alloc_coherent(dmadev, SQ_SIZE(depth),
+		nvmeq->sq_cmds = dma_alloc_coherent(dmadev, SQ_SIZE(depth),
  					&nvmeq->sq_dma_addr, GFP_KERNEL);
-	if (!nvmeq->sq_cmds)
-		goto free_cqdma;
+		if (!nvmeq->sq_cmds)
+			goto free_cqdma;
+	} else {
+		nvmeq->sq_dma_addr = pci_resource_start(dev->pci_dev, 2);
+		nvmeq->sq_cmds = dev->ctrl_mem;
+		nvmeq->cq_dma_addr = pci_resource_start(dev->pci_dev, 2) + 0x1000;
+		nvmeq->cqes = dev->ctrl_mem + 0x1000;
+	}
+	memset((void *)nvmeq->cqes, 0, CQ_SIZE(depth));

  	nvmeq->q_dmadev = dmadev;
  	nvmeq->dev = dev;
@@ -2085,6 +2094,8 @@
  	dev->db_stride = NVME_CAP_STRIDE(readq(&dev->bar->cap));
  	dev->dbs = ((void __iomem *)dev->bar) + 4096;

+	if (readl(&dev->bar->cmbsz) || 0)
+		dev->ctrl_mem = ioremap(pci_resource_start(pdev, 2), 0x8000000);
  	return 0;

   disable:
diff -ur /include/linux/nvme.h /include/linux/nvme.h
--- /include/linux/nvme.h	2014-01-14 11:05:25.000000000 -0700
+++ /include/linux/nvme.h	2014-12-05 10:35:10.059748463 -0700
@@ -36,6 +36,8 @@
  	__u32			aqa;	/* Admin Queue Attributes */
  	__u64			asq;	/* Admin SQ Base Address */
  	__u64			acq;	/* Admin CQ Base Address */
+	__u32			cmbloc;	/* Controller memory buffer location */
+	__u32			cmbsz;	/* Controller memory buffer size */
  };

  #define NVME_CAP_MQES(cap)	((cap) & 0xffff)
@@ -84,6 +86,7 @@
  	u32 ctrl_config;
  	struct msix_entry *entry;
  	struct nvme_bar __iomem *bar;
+	volatile void __iomem *ctrl_mem;
  	struct list_head namespaces;
  	struct kref kref;
--

On Fri, 5 Dec 2014, Matias Bjørling wrote:
> Hi Stephen,
>
> The tree is here:
>
>  http://git.infradead.org/users/kbusch/qemu-nvme.git
>
> Cheers,
> Matias
>
> On 12/05/2014 10:02 AM, Stephen Bates wrote:
>> Keith
>> 
>> " I often implement h/w features on a virtual device if real h/w is 
>> not available. If you're interested, I'll add CMB to my QEMU tree 
>> sometime in the next week."
>> 
>> That would be great. Can you send a link to that tree?
>> 
>> Cheers
>> 
>> Stephen
>> 
>> -----Original Message-----
>> From: Keith Busch [mailto:keith.busch at intel.com]
>> Sent: Friday, December 5, 2014 8:31 AM
>> To: Stephen Bates
>> Cc: Keith Busch; linux-nvme at lists.infradead.org
>> Subject: Re: NVM Express 1.2 - Controller Memory Buffer Functionality.
>> 
>> On Thu, 4 Dec 2014, Stephen Bates wrote:
>>> Keith
>>> 
>>> Ah, very much a case of "be careful what you ask for" ;-). OK I will 
>>> start to look at this soon.  One issue I can forsee is lack of 1.2 
>>> compliant drives to do testing on. Does anyone have any ideas how 
>>> best to handle that?
>> 
>> I often implement h/w features on a virtual device if real h/w is not 
>> available. If you're interested, I'll add CMB to my QEMU tree 
>> sometime in the next week.
>> 
>>> Cheers
>>> Stephen
>> 
>> _______________________________________________
>> Linux-nvme mailing list
>> Linux-nvme at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>> 
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>



More information about the Linux-nvme mailing list