[PATCH] NVMe: Use CMB for the SQ if available
Matthew Wilcox
willy at linux.intel.com
Mon Jun 22 07:48:05 PDT 2015
On Fri, Jun 19, 2015 at 10:47:04PM +0000, Sam Bradshaw (sbradshaw) wrote:
> > @@ -376,7 +394,12 @@ static int __nvme_submit_cmd(struct nvme_queue
> > *nvmeq, struct nvme_command *cmd) {
> > u16 tail = nvmeq->sq_tail;
> >
> > - memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd));
> > + if (nvmeq->cmb_mapped)
> > + memcpy_toio(&nvmeq->sq_cmds[tail], cmd,
> > + sizeof(*cmd));
> > + else
> > + memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd));
> > +
> > if (++tail == nvmeq->q_depth)
> > tail = 0;
> > writel(tail, nvmeq->q_db);
>
> I think a store fence is necessary between memcpy_toio() and the doorbell ring.
> This applies elsewhere in the patch as well.
>
> For example, we've seen rare cases where Haswells do not emit the whole SQE out
> of the write combine buffers before the doorbell write traverses PCIe. Other
> architectures may have a similar need.
That isn't supposed to happen. A write to an uncached area is supposed
to flush the WC buffers. See section 11.3 in the Intel SDM volume 3:
Write Combining (WC) — System memory locations are not cached
(as with uncacheable memory) and coherency is not enforced by
the processor’s bus coherency protocol. Speculative reads are
allowed. Writes may be delayed and combined in the write combining
buffer (WC buffer) to reduce memory accesses. If the WC buffer is
partially filled, the writes may be delayed until the next occurrence
of a serializing event; such as, an SFENCE or MFENCE instruction,
CPUID execution, a read or write to uncached memory, an interrupt
occurrence, or a LOCK instruction execution.
Of course, any CPU may have errata, but I'd like something a little
stronger than the assertion above before we put in an explicit fence.
More information about the Linux-nvme
mailing list