[PATCH v3 03/15] ARM: mcpm: introduce helpers for platform coherency exit/setup

Fri Feb 1 00:10:03 EST 2013

On Thursday 31 January 2013 10:46 PM, Nicolas Pitre wrote:
> On Thu, 31 Jan 2013, Santosh Shilimkar wrote:
>
>> On Tuesday 29 January 2013 01:20 PM, Nicolas Pitre wrote:
>>> From: Dave Martin <dave.martin at linaro.org>
>>>
>>> This provides helper methods to coordinate between CPUs coming down
>>> and CPUs going up, as well as documentation on the used algorithms,
>>> so that cluster teardown and setup
>>> operations are not done for a cluster simultaneously.
>>>
>>> For use in the power_down() implementation:
>>>     * __mcpm_cpu_going_down(unsigned int cluster, unsigned int cpu)
>>>     * __mcpm_outbound_enter_critical(unsigned int cluster)
>>>     * __mcpm_outbound_leave_critical(unsigned int cluster)
>>>     * __mcpm_cpu_down(unsigned int cluster, unsigned int cpu)
>>>
>>> The power_up_setup() helper should do platform-specific setup in
>>> preparation for turning the CPU on, such as invalidating local caches
>>> or entering coherency.  It must be assembler for now, since it must
>>> run before the MMU can be switched on.  It is passed the affinity level
>>> which should be initialized.
>>>
>>> Because the mcpm_sync_struct content is looked-up and modified
>>> with the cache enabled or disabled depending on the code path, it is
>>> crucial to always ensure proper cache maintenance to update main memory
>>> right away.  Therefore, any cached write must be followed by a cache
>>> clean operation and any cached read must be preceded by a cache
>>> invalidate operation (actually a cache flush i.e. clean+invalidate to
>>> avoid discarding possible concurrent writes) on the accessed memory.
>>>
>>> Also, in order to prevent a cached writer from interfering with an
>>> adjacent non-cached writer, we ensure each state variable is located to
>>> a separate cache line.
>>>
>>> Thanks to Nicolas Pitre and Achin Gupta for the help with this
>>> patch.
>>>
>>> Signed-off-by: Dave Martin <dave.martin at linaro.org>
>>> ---
>> [..]
>>
>>> diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
>>> index c8c0e2113e..2b83121966 100644
>>> --- a/arch/arm/common/mcpm_entry.c
>>> +++ b/arch/arm/common/mcpm_entry.c
>>> @@ -18,6 +18,7 @@
>>>    #include <asm/proc-fns.h>
>>>    #include <asm/cacheflush.h>
>>>    #include <asm/idmap.h>
>>> +#include <asm/cputype.h>
>>>
>>>    extern volatile unsigned long
>>> mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
>>>
>> [...]
>>
>>> +/*
>>> + * Ensure preceding writes to *p by other CPUs are visible to
>>> + * subsequent reads by this CPU.  We must be careful not to
>>> + * discard data simultaneously written by another CPU, hence the
>>> + * usage of flush rather than invalidate operations.
>>> + */
>>> +static void __sync_range_r(volatile void *p, size_t size)
>>> +{
>>> +	char *_p = (char *)p;
>>> +
>>> +#ifdef CONFIG_OUTER_CACHE
>>> +	if (outer_cache.flush_range) {
>>> +
>> You don't need above #ifdef. In case of non-outer
>> cache the function pointer is null anyways.
>
> We do need the #ifdef, because if CONFIG_OUTER_CACHE is not selected
> then the outer_cache structure simply doesn't exist.
>
You are right. #ifdef in middle of the code looks bit ugly and hence
was thinking to avoid it.

>> 		/*
>>> +		 * Ensure dirty data migrated from other CPUs into our cache
>>> +		 * are cleaned out safely before the outer cache is cleaned:
>>> +		 */
>>> +		__cpuc_clean_dcache_area(_p, size);
>>> +
>>> +		/* Clean and invalidate stale data for *p from outer ... */
>>> +		outer_flush_range(__pa(_p), __pa(_p + size));
>>> +	}
>>> +#endif
>>> +
>>> +	/* ... and inner cache: */
>>> +	__cpuc_flush_dcache_area(_p, size);
>> This will be un-necessary when inner cache is available, no ?
>> May be you can re-arrange the code like below, unless and until
>> you would like to invalidate any speculative fetches during the
>> outer_flush_range()
>>
>> 	__cpuc_clean_dcache_area(_p, size);
>> 	if (outer_cache.flush_range)
>> 		outer_flush_range(__pa(_p), __pa(_p + size));
>
> As you said, the code is sequenced that way to get rid of potential
> speculative fetch that could happen right before L2 is flushed.
>
Thanks for clarifying it. It makes sense.

Regards,
Santosh