[PATCH] ARM: Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv()

Arnd Bergmann arnd at arndb.de
Fri Dec 11 01:24:21 PST 2015


On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote:

> This is my counter proposal to Stephen's version. Those patches and the 
> discussion that followed are available here:
> 
>   http://news.gmane.org/group/gmane.linux.kbuild.devel/thread=14007
> 
> I think what I propose here is much simpler and less intrusive.  Going
> to the full call site patching provides between moderate to no 
> additional performance benefits depending on the CPU core.  That should 
> probably be compared with this solution with benchmark results to 
> determine if it is worth it.

Looks really nice, as it's much simpler than Stephen's approach

> Of course the ultimate performance will come from having gcc emit those 
> div instructions directly, but that would make the kernel binary 
> incompatible with some systems in a multi-arch config. Hence it has to 
> be a separate config option.

Well, what we found is that we already have the same problem today
when enabling LPAE makes the kernel incompatible with platforms that
can be enabled. The affected platforms are basically the same, except
for the earlier PJ4 and Krait variants that have some idiv support
but no LPAE.

> diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> index 85e374f873..48c77d422a 100644
> --- a/arch/arm/include/asm/cputype.h
> +++ b/arch/arm/include/asm/cputype.h
> @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void)
>  
>  	return 0;
>  }
> +
> +static inline bool __attribute_const__ cpu_is_pj4_nomp(void)
> +{
> +	return read_cpuid_part() == 0x56005810;
> +}
>  #else
> -#define cpu_is_pj4()	0
> +#define cpu_is_pj4()		0
> +#define cpu_is_pj4_nomp()	0
>  #endif

It would be nice to use a symbolic constant for the above, and maybe define
all three known cpuid numbers for PJ4 variants.

I've looked at it some more and I now have doubts about this code:

#ifdef CONFIG_CPU_PJ4B
        .type   __v7_pj4b_proc_info, #object
__v7_pj4b_proc_info:
        .long   0x560f5800
        .long   0xff0fff00
        __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions
        .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
#endif


Can someone have a look and tell me that I'm wrong when I read this
as matching both PJ4 and PJ4B (and PJ4B-MP)?

Either I'm misreading this, or we do the wrong thing in configurations
that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove).

> +#ifdef CONFIG_ARM_PATCH_IDIV
> +
> +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */
> +static inline u32 __attribute_const__ sdiv_instruction(void)
> +{
> +	if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> +		u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1);
> +		if (cpu_is_pj4_nomp())
> +			insn = __opcode_thumb32_compose(0xee30, 0x0691);
> +		return __opcode_to_mem_thumb32(insn);
> +	}

Strictly speaking, we can ignore the thumb instruction for pj4, as all
variants also support the normal T2 opcode for udiv/sdiv.

> diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S
> index af2267f6a5..9397b2e532 100644
> --- a/arch/arm/lib/lib1funcs.S
> +++ b/arch/arm/lib/lib1funcs.S
> @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA.  */
>  .endm
>  
>  
> +#ifdef CONFIG_ARM_PATCH_IDIV
> +	.align	3
> +#endif
> +
>  ENTRY(__udivsi3)
>  ENTRY(__aeabi_uidiv)
>  UNWIND(.fnstart)

I'd probably just leave out the #ifdef and always align this, but it's not
important.

	Arnd




More information about the linux-arm-kernel mailing list