parallel load of modules on an ARM multicore

Catalin Marinas catalin.marinas at arm.com
Tue Jun 21 11:50:30 EDT 2011


On Mon, Jun 20, 2011 at 03:43:27PM +0200, EXTERNAL Waechtler Peter (Fa. TCP, CM-AI/PJ-CF31) wrote:
> I'm getting unexpected results from loading several modules - some
> of them in parallel - on an ARM11 MPcore system.
> 
> The system does not use "single sequential" modprobe commands - instead it
> starts several shells doing insmod sequences like this:
> 
> shell A:
> insmod a
> insmod ab
> insmod abc
> 
> shell B:
> insmod b
> insmod bc
> insmod bcd
> 
> shell C:
> insmod c
> insmod cd
> insmod cde
> 
> This is done to crash^H^H^H^H^Hboot faster ;)
> 
> While one insmod operation is protected via the module_mutex - I'm wondering
> what happens with the instruction cache invalidation.
> AFAICT the flush_icache_range only invalidates the ICache on the running cpu.
> The module_mutex is unlocked after _loading_ the module, do_mod_ctors() and
> do_one_initcall() are called without that lock - can they run on a different cpu?
> It's an preemptible system (SMP PREEMPT armv6l).
> 
> Wouldn't it be required to flush the icache on _all_ cpus?

I theory, you would need to flush both the I and D cache on all the CPUs
but that's not easily possible (well, I don't think there are many users
of flush_icache_range(), so we could make this do an
smp_call_function().

I think what happens (I'm looking at a 3.0-rc3 kernel) is that the
module code is copied into RAM and the process could migrate to another
CPU before flush_icache_range() gets called (this function is called
outside the mutex_lock regions). We therefore don't flush the data out
of the D-cache that was allocated during copy_from_user().

You can try the patch below (I haven't tested it for some time), it may
fix the issue.

8<-----------------------------------------------

ARM: Allow lazy cache flushing on ARM11MPCore

From: Catalin Marinas <catalin.marinas at arm.com>

The ARM11MPCore doesn't broadcast the cache maintenance operations in
hardware, therefore the flush_dcache_page() currently performs the cache
flushing non-lazily. But since not all drivers call this function after
writing to a page cache page, the kernel needs a different approach like
using read-for-ownership on the CPU flushing the cache to force the
dirty cache lines migration from other CPUs. This way the cache flushing
operation can be done lazily via __sync_icache_dcache().

Since we cannot force read-for-ownership on the I-cache, the patch
invalidates the whole I-cache when a thread migrates to another CPU.

Signed-off-by: Catalin Marinas <catalin.marinas at arm.com>
---
 arch/arm/include/asm/mmu_context.h |    3 ++-
 arch/arm/mm/cache-v6.S             |    8 ++++++++
 arch/arm/mm/flush.c                |    3 +--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
index 71605d9..d116a23 100644
--- a/arch/arm/include/asm/mmu_context.h
+++ b/arch/arm/include/asm/mmu_context.h
@@ -114,7 +114,8 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 #ifdef CONFIG_SMP
 	/* check for possible thread migration */
 	if (!cpumask_empty(mm_cpumask(next)) &&
-	    !cpumask_test_cpu(cpu, mm_cpumask(next)))
+	    (cache_ops_need_broadcast() ||
+	     !cpumask_test_cpu(cpu, mm_cpumask(next))))
 		__flush_icache_all();
 #endif
 	if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next) {
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 73b4a8b..bdc1cc1 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -133,6 +133,10 @@ ENTRY(v6_coherent_user_range)
 #ifdef HARVARD_CACHE
 	bic	r0, r0, #CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	/* no cache maintenance broadcasting on ARM11MPCore */
+ USER(	ldr	r2, [r0]		)	@ read for ownership
+#endif
  USER(	mcr	p15, 0, r0, c7, c10, 1	)	@ clean D line
 	add	r0, r0, #CACHE_LINE_SIZE
 2:
@@ -178,6 +182,10 @@ ENTRY(v6_flush_kern_dcache_area)
 	add	r1, r0, r1
 	bic	r0, r0, #D_CACHE_LINE_SIZE - 1
 1:
+#ifdef CONFIG_SMP
+	/* no cache maintenance broadcasting on ARM11MPCore */
+	ldr	r2, [r0]			@ read for ownership
+#endif
 #ifdef HARVARD_CACHE
 	mcr	p15, 0, r0, c7, c14, 1		@ clean & invalidate D line
 #else
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 1a8d4aa..72f9333 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -291,8 +291,7 @@ void flush_dcache_page(struct page *page)
 
 	mapping = page_mapping(page);
 
-	if (!cache_ops_need_broadcast() &&
-	    mapping && !mapping_mapped(mapping))
+	if (mapping && !mapping_mapped(mapping))
 		clear_bit(PG_dcache_clean, &page->flags);
 	else {
 		__flush_dcache_page(mapping, page);

-- 
Catalin



More information about the linux-arm-kernel mailing list