[Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1

Fri Apr 29 12:32:09 EDT 2011

On Fri, 29 Apr 2011 08:59:58 +0100
Russell King - ARM Linux <linux at arm.linux.org.uk> wrote:

> On Fri, Apr 29, 2011 at 07:50:12AM +0200, Thomas Hellstrom wrote:
> > However, we should be able to construct a completely generic api around  
> > these operations, and for architectures that don't support them we need  
> > to determine
> >
> > a)  Whether we want to support them anyway (IIRC the problem with PPC is  
> > that the linear kernel map has huge tlb entries that are very  
> > inefficient to break up?)
> 
> That same issue applies to ARM too - you'd need to stop the entire
> machine, rewrite all processes page tables, flush tlbs, and only
> then restart.  Otherwise there's the possibility of ending up with
> conflicting types of TLB entries, and I'm not sure what the effect
> of having two matching TLB entries for the same address would be.

Right, I don't think anyone wants to see this sort of thing happen with
any frequency.  So either a large, uncached region can be set up a boot
time for allocations, or infrequent, large requests and conversions can
be made on demand, with memory being freed back to the main, coherent
pool under pressure.

> > b)  Whether they are needed at all on the particular architecture. The  
> > Intel x86 spec is, (according to AMD), supposed to forbid conflicting  
> > caching attributes, but the Intel graphics guys use them for GEM. PPC  
> > appears not to need it.
> 
> Some versions of the architecture manual say that having multiple
> mappings with differing attributes is unpredictable.

Yes, there's a bit of abuse going on there.  We've received a guarantee
that if the CPU speculates a line into the cache, as long as it's not
modified through the cacheable mapping the CPU won't write it back to
memory; it'll discard the line as needed instead (iirc AMD CPUs will
actually write back clean lines, so GEM wouldn't work the same way
there).

But even with GEM, there is a large performance penalty for having to
allocate a new buffer object the first time.  Even though we don't have
to change mappings by stopping the machine etc, we still have to flush
out everything from the CPU relating to the object (since some lines
may be dirty), and then flush the memory controller buffers before
accessing it through the uncached mapping.  So at least currently,
we're all in the same boat when it comes to new object allocations:
they will be expensive unless you already have some uncached mappings
you can re-use.

-- 
Jesse Barnes, Intel Open Source Technology Center