DMA_ATTR_WEAK_ORDERING defintion, was Re: [PATCH] nvme: set DMA_ATTR_WEAK_ORDERING attribute on dma buffers

chris hyser chris.hyser at oracle.com
Mon Jul 10 09:17:07 PDT 2017


On 7/5/2017 3:07 PM, Christoph Hellwig wrote:
> On Tue, Jun 27, 2017 at 04:46:31PM -0400, chris hyser wrote:
>> I put this in for SPARC. In our case the host bridge/RC itself follows very
>> strict ordering unless the relaxed order bit is set in the TLP. This works
>> great for devices that actually allow the driver to enable it. We however
>> also have to support an infiniband card that does not support enabling this
>> in the HW and thus in the TLP but is actually fine with relaxed order for
>> the data buffers (ie the streaming I/O vs the coherent control buffers). In
>> fact w/o relaxed order the performance is absolutely atrocious ... w/
>> exceeds x86. This flag enables the driver to signal to us when we map the
>> buffer in the IOMMU to enable the relaxed order attribute for our HW.
>
> We'll really need to start writing down our semantics.  As I said

Fair enough. This will get documented.

> given how our streaming dma mappings (dma_map_single/page/sg) are
> defined I can't think of any way how relaxed vs strict ordering would
> matter for them, so just enabling it by default seems like the right
> thing in this case, instead of having to patch every PCIe driver people

People get real squeamish about just turning on relaxed order for driver/devices we actually know something about and 
test let alone everything. I do agree with your interpretation of the streaming DMA semantics. That does not mean every 
driver writer did or does. :-)

> might ever use on sparc to work around your bridge.

To be clear, this isn't really a host bridge issue. Technically no driver that runs on SPARC needs this enabled -- even 
performance critical ... if they do the HW correctly. Again, we have a necessary performance critical device which does 
not correctly signal its okayness with relaxed order for non-control DMA. This feature of the host bridge is to enable a 
workaround for just these cases (deemed needed by past history). Said differently, lots of devices are made for the 
commodity market where shortcuts like this apparently do not affect performance but are critical to performance on large 
systems.

-chrish



More information about the Linux-nvme mailing list