dma_alloc_coherent versus streaming DMA, neither works satisfactory
Mike Looijmans
mike.looijmans at topic.nl
Thu May 7 07:08:54 PDT 2015
On 07-05-15 15:31, Mike Looijmans wrote:
> On 07-05-15 15:21, Daniel Drake wrote:
>> On Thu, May 7, 2015 at 5:18 AM, Mike Looijmans <mike.looijmans at topic.nl> wrote:
>>> I reverted all my patches and workarounds. Indeed, the kernel needs a
>>> "coherent" version of the dma_mmap routine, as the current version will map
>>> it as non-cachable, resulting in a big performance hit (and nullifying the
>>> whole idea behind it).
>>>
>>> I'll test it further on my 'hardware' and cook up a patch that correctly
>>> maps the coherent pages.
>>
>> Sorry that I have only read this thread briefly, but I wonder if this
>> is what you are looking for:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/325489.html
>
> It's related, but targets another use case. This one does the same in case the
> driver requested non-consistent memory.
>
> My use case was that I have hardware implemented coherency (through ACP) so
> the CPU's and device's view on memory is already consistent, regardless of the
> status of the cache.
>
> The patches are complimentary, not overlapping.
>
> Thanks for the link though, it's something I was also looking into, as I don't
> always need coherency.
I read the rest of the thread, apparently it was never integrated.
The patch for "non-consistent" is a BUG FIX, not some feature request or so. I
was already wondering why my driver had to kalloc pages to get proper caching
on it.
From https://www.kernel.org/doc/Documentation/DMA-attributes.txt:
"""
DMA_ATTR_NON_CONSISTENT ... lets the platform to choose to return either
consistent or non-consistent memory as it sees fit. By using this API,
you are guaranteeing to the platform that you have all the correct and
necessary sync points for this memory in the driver.
"""
The current ARM implementation is to *always* return memory that is
non-cachable, even if the driver promises to do all the right things.
If the intention was that every implementation could get away with just
ignoring the flag, the flag would not have existed. So the implementation
should do the best it can do here, and the patch shows that it's just a simple
one-liner to make it implement the flag as intended.
As for use cases, IIO is a candidate for this too, as it has explicit
interfaces to move buffers to/from userspace without having to remap them over
and over again. My usecase here is Dyplo, which uses a similar interface. If
you do something as simple as "for (i=0;i<size;++i) sum += (char*)buffer[i];
in userspace on such a buffer, performance will collapse to about 1/20th
because of the memory being non-cachable. The whole point of IIO is to prevent
having to copy these buffers around.
(I'd rather add this plea to the that thread, but I'd have to figure out how
to reply to a thread from the web...)
Mike.
Kind regards,
Mike Looijmans
System Expert
TOPIC Embedded Products
Eindhovenseweg 32-C, NL-5683 KH Best
Postbus 440, NL-5680 AK Best
Telefoon: +31 (0) 499 33 69 79
Telefax: +31 (0) 499 33 69 70
E-mail: mike.looijmans at topicproducts.com
Website: www.topicproducts.com
Please consider the environment before printing this e-mail
More information about the linux-arm-kernel
mailing list