[PATCH] POWERPC: MTD: Add cached map support to physmap_of MTD driver

Wed Dec 17 16:01:54 EST 2008

On Tue, 16 Dec 2008, Benjamin Herrenschmidt wrote:
> On Mon, 2008-12-15 at 17:11 -0800, Trent Piepho wrote:
>> Shame, as it provides a huge speed up.  I suppose an alternative would be
>> to map the chip twice at different physical addresses, by just configuring
>> the chip select to be twice the size it should be, and giving them
>> different cacheability.
>
> Nice trick. That would probably work.

Thinking about it more, this is probably the way to do it.  Mapping the
same address twice appeared to worked for me, but it looks like it's a bad
thing to do.  To bad I didn't have time to finish this.

Creating two copies of the flash chip will take twice the physical address
space, but the virtual address space used is the same as mapping the chip
twice.  Since kernel virtual address space <= physical address space, there
really shouldn't be a problem with that.

Probably do something like this to the dts:

  localbus {
-	ranges = <0x1 0x0 0xe8000000 0x08000000>;
+	ranges = <0x1 0x0 0xe0000000 0x10000000>;  /* CS size x2 */
-       nor at 1,0 {
+       nor at 1,0x08000000 {
 		compatible = "cfi-flash";
-		reg = <0x1 0x0 0x08000000>;
+		reg = <0x1 0x08000000 0x08000000>;
+		cached-alias = <&cached_nor>;
         };
+	cached_nor: nor at 1,0 {
+		compatible = "alias";
+		reg = <0x1 0 0x08000000>;
+	};
  }

Since physmap_of is an openfirmware driver, it won't be a problem to have
if look for "cached-alias" to get the range to map as cached.  The MTD
layer only supports one "map->phys" address, but I don't think this address
is used for anything on powerpc.

>> Or changing the mapping for writes and then changing it back.  It wouldn't
>> be necessary to change the whole thing, just the page being written to.
>
> Right though changing mappings can be expensive. It might be worth
> looking at using fixmap for that tho, which is the fastest way to setup
> and tear down mappings, especially since we can (though we don't today)
> implement a bypass on those to directly load the TLB.

The MTD layer appears to program flash one word at a time, so writing to
flash would mean changing maps on a per word basis.  Of course flash is
slow too so maybe the relative cost is not that much.  It takes more
modifications to MTD than the previous method.

> The problem gets worsened by the fact that cores that support
> speculative loads and prefetch will potentially bring anything mapped
> into the cache even if it's not directly accessed.

This is really the whole point of mapping it cached.  Since the cpu can
prefetch data, it's able to use more efficient back-to-back reads or page
burst mode to read a whole cache line at once.  The latter can more than
triple the read rate.