[PATCH 06/10] soc/qbman: Add ARM equivalent for flush_dcache_range()

Robin Murphy robin.murphy at arm.com
Wed Feb 1 05:03:44 PST 2017


On 30/01/17 19:04, Roy Pledge wrote:
> On 1/30/2017 10:31 AM, Robin Murphy wrote:
>> On 28/01/17 02:34, Scott Wood wrote:
>>> On Fri, 2017-01-27 at 17:41 +0100, Arnd Bergmann wrote:
>>>> On Thu, Jan 26, 2017 at 6:08 AM, Scott Wood <scott.wood at nxp.com> wrote:
>>>>> On 01/25/2017 03:20 PM, Arnd Bergmann wrote:
>>>>>> On Monday, January 23, 2017 7:24:59 PM CET Roy Pledge wrote:
>>>>>> If this is normal RAM, you should be able to just write zeroes, and then
>>>>>> do a dma_map_single() for initialization.
>>>>> The DMA API on PPC currently has no way of knowing that the device
>>>>> accesses this memory incoherently.
>>>> Ah, this is because PPC doesn't use the 'dma-coherent' property on devices
>>>> but just assumes that coherency is a global property of the platform,right?
>>> Right.
>>>
>>>>> If that were somehow fixed, we still couldn't use dma_map_single() as it
>>>>> doesn't accept virtual addresses that come from memremap() or similar
>>>>> dynamic mappings.  We'd have to convert the physical address to an array
>>>>> of struct pages and pass each one to dma_map_page().
>>>> Sorry for my ignorance, but where does the memory come from to start
>>>> with? Is this in the normal linearly mapped RAM, in a carveout outside
>>>> of the linear mapping but the same memory, or in some on-chip buffer?
>>> It's RAM that comes from the device tree reserved memory mechanism
>>> (drivers/of/of_reserved_mem.c).  On a 32-bit kernel it is not guaranteed (or
>>> likely) to be lowmem.
>> Wouldn't dma_declare_coherent_memory() be the appropriate tool for that
>> job, then (modulo the PPC issue)? On ARM that should result in
>> dma_alloc_coherent() giving back a non-cacheable mapping if the device
>> is non-coherent, wherein a dmb_wmb() after writing the data from the CPU
>> side should be enough to ensure it is published to the device.
> I think there is some confusion here (and it may very well be mine).
> 
> My understanding is that the dma_declare_coherent_memory() API sets up a
> region of memory that will be managed by dma_alloc_coherent() and
> friends.  This is useful if the driver needs to manage a region of on
> device memory but that isn't what this specific region is used for.

It's a bit more general than that - dma_alloc_coherent() can essentially
be considered "give me some memory for this device to use". We already
have use-cases where such buffers are only ever accessed by the device
(e.g. some display controllers, and future NVMe devices), hence
DMA_ATTR_NO_KERNEL_MAPPING on ARM to save the vmalloc space.

A DMA allocation also inherently guarantees appropriate alignment,
regardless of whether you're using a per-device reservation or just
regular CMA, and will also zero the underlying memory (and for a
non-coherent device perform whatever cache maintenance is necessary, if
the clearing isn't already done via a non-cacheable mapping).

All you need to do in the driver is allocate your buffer and hand the
resulting address off to the device at probe (after optionally checking
for a reservation in DT and declaring it), then free it at remove, which
also ends up far more self-documenting (IMO) than a bunch of open-coded
remapping and #ifdef'ed architecture-private cache shenanigans.

> The memory that is trying to be initialized here is a big chunk
> (possible multiple megabytes) of RAM that only the QBMan device will
> access with one exception - at initialization the device expects
> software to zero the memory before starting to use the device. Since the
> CPUs will never access this region again the device does
> non-coherent/non-shareable accesses for performance reasons since QBMan
> device is the only user - no need to maintain coherency with core side
> caches.
> 
> We used the of_reserve memory as that seemed to be the best reliable way
> to guarantee that we would get a properly aligned contiguous allocation
> and it's been working well.  The downside is that the contents of that
> memory is undefined so we had to map it, zero it and flush the cache in
> order to get the RAM into the desired state and make sure we don't get
> hit by a random castout in the future.
> 
> Would it make sense to add an option in the of_reserved_mem system to
> fill the memory?  I haven't looked at the feasibility of that but it
> seems like a generic solution that could be useful to others.  We could
> add the fill value to the device tree so you could initialize to any
> pattern.

In short, said generic solution is right there already, only the PPC
arch code might need tweaking to accommodate it :)

Robin.

> 
> - Roy
> 
>>
>> Robin.
>>
>>>>> And even if we did all that, there would still be other manual cache
>>>>> manipulation left in this driver, to deal with its cacheable register
>>>>> interface.
>>>> I thought we had concluded that "cacheable register" is something
>>>> that cannot work reliably on ARM at all when this came up before.
>>>> Any updates on that?
>>> I'm not familiar with the details there...  My understanding is that the
>>> hardware people at NXP are convinced that it can work on these specific chips
>>> due to implementation details.
>>>
>>> -Scott
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
> 




More information about the linux-arm-kernel mailing list