NVM Express 1.2 - Controller Memory Buffer Functionality.

Keith Busch keith.busch at intel.com
Fri Dec 5 15:28:59 PST 2014


I'm probably going to get yelled at for doing this instead of what I'm
supposed to be doing, but sometimes fun distractions are fun!

The QEMU part of CMB is applied in the my tree, as well as a few fixes
for other merges I messed up. This is the CMB feature:

http://git.infradead.org/users/kbusch/qemu-nvme.git/commitdiff/aee710c5ce4acb11583b85bc7f1c6ba8bea155d5

I was a bit lazy with it, using an exlusive BAR for controller memory
fixed at 128M. I'm also led to believe I'm violating proper MemoryRegion
usage by reading "private" values, but I don't see how else to do it!

Here's an qemu example parameters to set up your device for CMB:

   -drive file=<nvme.img>,if=none,id=foo -device nvme,drive=foo,serial=baz,cmb=1

I did have to write some driver bits to test (copied below), but again,
I was lazy and didn't do it the "right" way. Everything's hard-coded to
match the hard-coded values on the controller side. The only CMB use below
is allocating the Admin SQ and CQ out of the CMB. This is definitely going
to be slower on QEMU, so don't even try to do performance comparisons. :)

---
diff -ur /drivers/block/nvme-core.c /drivers/block/nvme-core.c
--- /drivers/block/nvme-core.c	2014-12-05 15:28:53.662943237 -0700
+++ /drivers/block/nvme-core.c	2014-12-05 15:41:15.760944823 -0700
@@ -1154,10 +1154,12 @@
  	}
  	spin_unlock_irq(&nvmeq->q_lock);

-	dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
+	if (nvmeq->qid || !nvmeq->dev->ctrl_mem) {
+		dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth),
  				(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
-	dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
+		dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth),
  					nvmeq->sq_cmds, nvmeq->sq_dma_addr);
+	}
  	kfree(nvmeq);
  }

@@ -1209,16 +1211,23 @@
  	if (!nvmeq)
  		return NULL;

-	nvmeq->cqes = dma_alloc_coherent(dmadev, CQ_SIZE(depth),
-					&nvmeq->cq_dma_addr, GFP_KERNEL);
-	if (!nvmeq->cqes)
-		goto free_nvmeq;
-	memset((void *)nvmeq->cqes, 0, CQ_SIZE(depth));
+	if (qid || !dev->ctrl_mem) {
+		nvmeq->cqes = dma_alloc_coherent(dmadev, CQ_SIZE(depth),
+				&nvmeq->cq_dma_addr, GFP_KERNEL);
+		if (!nvmeq->cqes)
+			goto free_nvmeq;

-	nvmeq->sq_cmds = dma_alloc_coherent(dmadev, SQ_SIZE(depth),
+		nvmeq->sq_cmds = dma_alloc_coherent(dmadev, SQ_SIZE(depth),
  					&nvmeq->sq_dma_addr, GFP_KERNEL);
-	if (!nvmeq->sq_cmds)
-		goto free_cqdma;
+		if (!nvmeq->sq_cmds)
+			goto free_cqdma;
+	} else {
+		nvmeq->sq_dma_addr = pci_resource_start(dev->pci_dev, 2);
+		nvmeq->sq_cmds = dev->ctrl_mem;
+		nvmeq->cq_dma_addr = pci_resource_start(dev->pci_dev, 2) + 0x1000;
+		nvmeq->cqes = dev->ctrl_mem + 0x1000;
+	}
+	memset((void *)nvmeq->cqes, 0, CQ_SIZE(depth));

  	nvmeq->q_dmadev = dmadev;
  	nvmeq->dev = dev;
@@ -2085,6 +2094,8 @@
  	dev->db_stride = NVME_CAP_STRIDE(readq(&dev->bar->cap));
  	dev->dbs = ((void __iomem *)dev->bar) + 4096;

+	if (readl(&dev->bar->cmbsz) || 0)
+		dev->ctrl_mem = ioremap(pci_resource_start(pdev, 2), 0x8000000);
  	return 0;

   disable:
diff -ur /include/linux/nvme.h /include/linux/nvme.h
--- /include/linux/nvme.h	2014-01-14 11:05:25.000000000 -0700
+++ /include/linux/nvme.h	2014-12-05 10:35:10.059748463 -0700
@@ -36,6 +36,8 @@
  	__u32			aqa;	/* Admin Queue Attributes */
  	__u64			asq;	/* Admin SQ Base Address */
  	__u64			acq;	/* Admin CQ Base Address */
+	__u32			cmbloc;	/* Controller memory buffer location */
+	__u32			cmbsz;	/* Controller memory buffer size */
  };

  #define NVME_CAP_MQES(cap)	((cap) & 0xffff)
@@ -84,6 +86,7 @@
  	u32 ctrl_config;
  	struct msix_entry *entry;
  	struct nvme_bar __iomem *bar;
+	volatile void __iomem *ctrl_mem;
  	struct list_head namespaces;
  	struct kref kref;
--

On Fri, 5 Dec 2014, Matias Bjørling wrote:
> Hi Stephen,
>
> The tree is here:
>
>  http://git.infradead.org/users/kbusch/qemu-nvme.git
>
> Cheers,
> Matias
>
> On 12/05/2014 10:02 AM, Stephen Bates wrote:
>> Keith
>> 
>> " I often implement h/w features on a virtual device if real h/w is not 
>> available. If you're interested, I'll add CMB to my QEMU tree sometime in 
>> the next week."
>> 
>> That would be great. Can you send a link to that tree?
>> 
>> Cheers
>> 
>> Stephen
>> 
>> -----Original Message-----
>> From: Keith Busch [mailto:keith.busch at intel.com]
>> Sent: Friday, December 5, 2014 8:31 AM
>> To: Stephen Bates
>> Cc: Keith Busch; linux-nvme at lists.infradead.org
>> Subject: Re: NVM Express 1.2 - Controller Memory Buffer Functionality.
>> 
>> On Thu, 4 Dec 2014, Stephen Bates wrote:
>>> Keith
>>> 
>>> Ah, very much a case of "be careful what you ask for" ;-). OK I will start 
>>> to look at this soon.  One issue I can forsee is lack of 1.2 compliant 
>>> drives to do testing on. Does anyone have any ideas how best to handle 
>>> that?
>> 
>> I often implement h/w features on a virtual device if real h/w is not 
>> available. If you're interested, I'll add CMB to my QEMU tree sometime in 
>> the next week.
>> 
>>> Cheers
>>> Stephen
>> 
>> _______________________________________________
>> Linux-nvme mailing list
>> Linux-nvme at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>> 
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


More information about the Linux-nvme mailing list