[PATCH] NVMe: Reduce divide operations

Matthew Wilcox willy at linux.intel.com
Fri Nov 21 05:50:00 PST 2014


On Thu, Nov 20, 2014 at 02:13:24PM -0800, Sam Bradshaw wrote:
> There are several expensive divide operations in the submit and 
> completion paths that can be converted to less expensive arithmetic 
> and logical operations.  Profiling shows significant drops in time 
> spent in nvme_alloc_iod() under common workloads as a result of this 
> change.

OK ... but I think you've taken this a bit far.

>  static int nvme_npages(unsigned size, struct nvme_dev *dev)
>  {
> -	unsigned nprps = DIV_ROUND_UP(size + dev->page_size, dev->page_size);
> -	return DIV_ROUND_UP(8 * nprps, dev->page_size - 8);
> +	unsigned page_size = (1 << dev->page_shift);
> +	unsigned nprps = (size >> dev->page_shift) + 1;
> +
> +	if (size & (page_size - 1))
> +		nprps++;
> +	if ((nprps << 3) < (page_size - 8))
> +		return 1;
> +	return DIV_ROUND_UP(nprps << 3, page_size - 8);
>  }

I don't think there's a compiler in the world that doesn't optimise
'x * 8' into 'x << 3' if the latter is cheaper.

Also, I think your nprps is now slightly larger than it used to be.
Shouldn't it look like this?

	unsigned nprps = size >> dev->page_shift;
	if (size & (page_size - 1))
		nprps++;

Let's imagine we're sending a misaligned 16k I/O.  We need 5 PRP entries,
but the first one goes in the command, so we need to allocate enough
space for 4 entries.  16k / 4k is 4, but your code says to add 1.

We can actually do slightly better than the original code in the case
where we're sending a 2MB I/O.  The code in the driver today thinks we
need a second page, but the last entry on the PRP page will be a PRP
Entry, not a PRP List Entry.  So I think this is the right calculation:

 static int nvme_npages(unsigned size, struct nvme_dev *dev)
 {
-	unsigned nprps = DIV_ROUND_UP(size + dev->page_size, dev->page_size);
-	return DIV_ROUND_UP(8 * nprps, dev->page_size - 8);
+	unsigned page_size = (1 << dev->page_shift);
+	unsigned nprps = size >> dev->page_shift;
+
+	if (size & (page_size - 1))
+		nprps++;
+	if ((nprps * 8) <= page_size)
+		return 1;
+	return DIV_ROUND_UP(nprps * 8, page_size - 8);
 }

(it still overestimates for I/Os on 2MB boundaries that are larger than
2MB, but I'm OK with that.  If somebody wants to come up with a neater
calculation, feel free).

We could further optimise it by knowing that we need 0 pages if the I/O
is <= 8k in size:

+	unsigned nprps,	page_size = (1 << dev->page_shift);
+
+	if (size <= page_size * 2)
+		return 0;
+	nprps = size >> dev->page_shift;
+	if (size & (page_size - 1))
...

but I'm not sure that it's worth saving those 8 bytes.




More information about the Linux-nvme mailing list