L1 & L2 cache flush sequence on CortexA5 MPcore w.r.t low power modes

Shilimkar, Santosh santosh.shilimkar at ti.com
Thu May 17 03:30:58 EDT 2012


On Thu, May 17, 2012 at 10:31 AM, Murali N <nalajala.murali at gmail.com> wrote:
> On Tue, May 15, 2012 at 11:47 PM, Will Deacon <will.deacon at arm.com> wrote:
>> Hi Russell,
>>
>> On Tue, May 15, 2012 at 05:36:18PM +0100, Russell King - ARM Linux wrote:
>>> I repeat: what happens in this situation on A9:
>>>
>>>       - clean cache
>>>       - cache speculatively prefetches data from another core
>>
>> If this prefetching occurs then either:
>>
>>        (a) The line is clean (no problem)
>>
>>        (b) Another core has written some data and we end up (speculatively)
>>            loading dirty lines
>>
>> Case (b) is only a problem if we actually commit to using the data later on.
>>
>>>       - clear SCTLR.C
>>>       - _this_ core accesses the address associated with that prefetched
>>>         data
>>
>> Yes. At this point it is cpu-specific whether or not we hit our dirty lines
>> from above. On A9, we will get the stale data from memory. However, this is
>> exactly the same situation we would find ourselves in if we tried to access
>> dirty data held in any cache with our SCTLR.C bit cleared. We're no longer
>> coherent at this stage, so need to avoid accessing shared data.
>>
>>> _That_ is a data corruption issue - as soon as SCTLR.C is cleared, the CPUs
>>> view of data in memory _changes_, and is only restored to what it should
>>> be when the dirty cache lines are finally flushed out of the cache.  And
>>> then, hey presto, the data magically changes again.
>>
>> Well we still can't see dirty data in any of the other L1 caches, so our view
>> of memory is going to be constantly out of date. The tricky bit is ensuring
>> that we don't rely on data being written by anybody else (and if we write
>> data ourself, we need to make sure it's suitably aligned so as not to get
>> clobbered by evictions from the other caches).
>>
>> Will
>
> how about following the below sequence still cause any possible problems?
>
> 1. L1 clean & invalidate
> 2. L2 clean & invalidate
> 3. Disable L2
> 4. L1 clean & invalidate
> 5. Disable "C" bit
> 6. WFI
>
This is wrong if the code path is common for
CPU and CPU cluster power down. As Russell pointed
out the corner cases, the sequence I got working without any
issues so far on OMAP is like below ...

- L1 clean & invalidate
- Disable "C" bit
- ISB
- L1 clean & invalidate
- Disable SMP bit
- ISB
- Check for cluster state
if cluster == OFF
- L2 clean & invalidate
isb
dsb
WFI
NOP ( To avoid speculative aborts if any)
NOP
NOP
NOP

No. of NOPS depends on the pipeline depth.

Hope it helps

Regards
Santosh



More information about the linux-arm-kernel mailing list