[PATCH 3/4] ARM: mm: kill unused TLB_CAN_READ_FROM_L1_CACHE and use ALT_SMP instead

Gregory CLEMENT gregory.clement at free-electrons.com
Wed May 15 09:18:53 EDT 2013


Hi Will,

On 03/25/2013 07:19 PM, Will Deacon wrote:
> Many ARMv7 cores have hardware page table walkers that can read the L1
> cache. This is discoverable from the ID_MMFR3 register, although this
> can be expensive to access from the low-level set_pte functions and is a
> pain to cache, particularly with multi-cluster systems.
> 
> A useful observation is that the multi-processing extensions for ARMv7
> require coherent table walks, meaning that we can make use of ALT_SMP
> patching in proc-v7-* to patch away the cache flush safely for these
> cores.

I encountered a regression with 3.10-rc1 on the Armada 370 based boards.
With the 3.10-rc1 they hang during auto testy of the xor engine which are
mainly DMA transfers. If I revert this patch, it no more hang. I found this
by using bisect, it was not obvious at all for me that this patch may have
cause this regression.
The issue appear in SMP and in UP. However I think that  the PJ4B-v7 used in
 the Armada 370 are not MP capable.

I made some investigation. And in UP if I remove the line:
	ALT_UP(W(nop))

at the begining of the cpu_v7_dcache_clean_are macro located in
arch/arm/mm/proc-v7.S

Then the kernel boot again. It is not surprising because in this case
we found the same generated code that before this patch was applied.

now I don't really understand why a W(nop) will cause this issue.

In SMP mode even with this line removed the kernel hang, but in this case
I am not sure of what happen exactly and how the .alt.smp.init section is
used.

I don't know if it is relevant but I tested with these 2 version of gcc:
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
and
gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-1ubuntu1)

I hope you will find some explanation and solution to this bug, because currently
the only solution I have is to revert this patch.

Thanks,
Gregory
> 
> Reported-by: Albin Tonnerre <Albin.Tonnerre at arm.com>
> Signed-off-by: Will Deacon <will.deacon at arm.com>
> ---
>  arch/arm/include/asm/tlbflush.h | 2 +-
>  arch/arm/mm/proc-v6.S           | 2 --
>  arch/arm/mm/proc-v7-2level.S    | 3 ++-
>  arch/arm/mm/proc-v7-3level.S    | 3 ++-
>  arch/arm/mm/proc-v7.S           | 4 ++--
>  5 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index c7cdb59..42d155e 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -169,7 +169,7 @@
>  # define v6wbi_always_flags	(-1UL)
>  #endif
>  
> -#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
> +#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_BARRIER | \
>  				 TLB_V6_U_FULL | TLB_V6_U_PAGE | \
>  				 TLB_V6_U_ASID | TLB_V6_BP | \
>  				 TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
> diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S
> index bcaaa8d..a286d47 100644
> --- a/arch/arm/mm/proc-v6.S
> +++ b/arch/arm/mm/proc-v6.S
> @@ -80,12 +80,10 @@ ENTRY(cpu_v6_do_idle)
>  	mov	pc, lr
>  
>  ENTRY(cpu_v6_dcache_clean_area)
> -#ifndef TLB_CAN_READ_FROM_L1_CACHE
>  1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
>  	add	r0, r0, #D_CACHE_LINE_SIZE
>  	subs	r1, r1, #D_CACHE_LINE_SIZE
>  	bhi	1b
> -#endif
>  	mov	pc, lr
>  
>  /*
> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
> index 78f520b..9704097 100644
> --- a/arch/arm/mm/proc-v7-2level.S
> +++ b/arch/arm/mm/proc-v7-2level.S
> @@ -110,7 +110,8 @@ ENTRY(cpu_v7_set_pte_ext)
>   ARM(	str	r3, [r0, #2048]! )
>   THUMB(	add	r0, r0, #2048 )
>   THUMB(	str	r3, [r0] )
> -	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
> +	ALT_SMP(mov	pc,lr)
> +	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
>  #endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_set_pte_ext)
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 6ffd78c..363027e 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -73,7 +73,8 @@ ENTRY(cpu_v7_set_pte_ext)
>  	tst	r3, #1 << (55 - 32)		@ L_PTE_DIRTY
>  	orreq	r2, #L_PTE_RDONLY
>  1:	strd	r2, r3, [r0]
> -	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
> +	ALT_SMP(mov	pc, lr)
> +	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
>  #endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_set_pte_ext)
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 3a3c015..37716b0 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -75,14 +75,14 @@ ENTRY(cpu_v7_do_idle)
>  ENDPROC(cpu_v7_do_idle)
>  
>  ENTRY(cpu_v7_dcache_clean_area)
> -#ifndef TLB_CAN_READ_FROM_L1_CACHE
> +	ALT_SMP(mov	pc, lr)			@ MP extensions imply L1 PTW
> +	ALT_UP(W(nop))
>  	dcache_line_size r2, r3
>  1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
>  	add	r0, r0, r2
>  	subs	r1, r1, r2
>  	bhi	1b
>  	dsb
> -#endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_dcache_clean_area)
>  
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com



More information about the linux-arm-kernel mailing list