Unaligned flush_dcache_range in axs101.c

Thu May 26 04:39:05 PDT 2016

Hi Marek,

On Fri, 2016-04-15 at 15:49 +0200, Marek Vasut wrote:
> On 04/15/2016 03:00 PM, Alexey Brodkin wrote:
> > Cache management functions should be implemented per arch or platform and so
> > that they match requirements of underlying hardware. If hardware may only work on
> > whole cache line then cache routines must do exactly this.
> Is ARC designed this way ?
> 
> > 
> > As an extra options there could be a check for input arguments in those functions
> > that will warn a user if he or she passes unaligned start and/or end addresses.
> Right, ARMv7 does it already, ARM926 too, others need fixing.
> 
> > 
> > Now speaking about data layout your example is very correct and we saw a lot of
> > such real use-cases - developer of drivers should design data layout such that
> > cache management doesn't lead to data corruption.
> Right.
> 
> > 
> > But again the code in question has nothing to do with cache line.
> > I only need to writeback 1 word and don't care what really happens with
> > all remaining words in the same cache line. And my implementation of
> > flush_dcache_range() will do everything perfectly correct - it will
> > flush exactly 1 line that contains my word I modified.
> And this will lead to silent corruption which is hard to track down and
> debug. I would rather have a check-and-refuse behavior than
> silently-fix-and-permit.

Even though that discussion is still not finished very related issue
actually hit me recently. So it looks like there's more here to discuss and
probably with wider audience.

So what happens: in our newer ARC HS38 cores we may have so-called IO Coherency
block. This hardware block basically makes sure data exchanged between CPU cores
(we may have single-core CPU or multi-core CPU) and DMA masters are coherent.
That means if IOC exists and enabled in the core we:
 1) Don't need to do manual cache management when sending data to peripherals and
    getting data back. IOC makes all work for us.
 2) We must not do manual cache management when sending data to peripherals and
    getting data back. Simply because our manual actions will:
    a) Interfere with what IOC already does
    b) Introduce extra performance overhead which we wanted to get rid of with
       invention of IOC.

So with peripherals it is all cool now.
I was just disabling all cache management routines if IOC gets detected on boot.

But returning back to your initial complaint.

In the code you were referring what I wanted to modify reset vector of the slave core.
And while we were living without IOC it was all OK. My code above wrote-back
(or as we used to call it within ARC "flushed") L1 data cache with modified
reset vector contents to L2 (which is combined data and instruction cache in case of ARC)
and once slave core was unhalted it read reset vector contents via instruction
fetch hardware and executed right what I wanted.

Now with IOC enabled and as I wrote above with disabled cache operations new reset
vector value gets stuck in L1 data cache of the master core. And slave cores never get
proper value in reset vector effectively going into the wild on unhalt.

The same actually applies to U-Boot relocation and happens even on single-core setups.
We copy U-Boot's image to the new location in the memory and until we explicitly flush
L1 data cache to the underlying L2 or DDR (if L2 doesn't exist) something may not make its
way back to the CPU when it starts fetching instructions from that new location.

So as a summary I would say that we may want to introduce different cache handling routines:
 [1] DMA related
 [2] Non-DMA related

Any thoughts? I'm especially interested if something similar happens on other arches and
how people deal with it.

-Alexey