[PATCH] ARM: Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv()
Nicolas Pitre
nicolas.pitre at linaro.org
Fri Dec 11 09:10:29 PST 2015
On Fri, 11 Dec 2015, Arnd Bergmann wrote:
> On Friday 11 December 2015 01:26:25 Nicolas Pitre wrote:
>
> > This is my counter proposal to Stephen's version. Those patches and the
> > discussion that followed are available here:
> >
> > http://news.gmane.org/group/gmane.linux.kbuild.devel/thread=14007
> >
> > I think what I propose here is much simpler and less intrusive. Going
> > to the full call site patching provides between moderate to no
> > additional performance benefits depending on the CPU core. That should
> > probably be compared with this solution with benchmark results to
> > determine if it is worth it.
>
> Looks really nice, as it's much simpler than Stephen's approach
>
> > Of course the ultimate performance will come from having gcc emit those
> > div instructions directly, but that would make the kernel binary
> > incompatible with some systems in a multi-arch config. Hence it has to
> > be a separate config option.
>
> Well, what we found is that we already have the same problem today
> when enabling LPAE makes the kernel incompatible with platforms that
> can be enabled. The affected platforms are basically the same, except
> for the earlier PJ4 and Krait variants that have some idiv support
> but no LPAE.
Still, making native div support depend on CONFIG_LPAE would be strange.
A separate config symbol would mmake it self explanatory.
> > diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> > index 85e374f873..48c77d422a 100644
> > --- a/arch/arm/include/asm/cputype.h
> > +++ b/arch/arm/include/asm/cputype.h
> > @@ -250,8 +250,14 @@ static inline int cpu_is_pj4(void)
> >
> > return 0;
> > }
> > +
> > +static inline bool __attribute_const__ cpu_is_pj4_nomp(void)
> > +{
> > + return read_cpuid_part() == 0x56005810;
> > +}
> > #else
> > -#define cpu_is_pj4() 0
> > +#define cpu_is_pj4() 0
> > +#define cpu_is_pj4_nomp() 0
> > #endif
>
> It would be nice to use a symbolic constant for the above, and maybe define
> all three known cpuid numbers for PJ4 variants.
>
> I've looked at it some more and I now have doubts about this code:
>
> #ifdef CONFIG_CPU_PJ4B
> .type __v7_pj4b_proc_info, #object
> __v7_pj4b_proc_info:
> .long 0x560f5800
> .long 0xff0fff00
> __v7_proc __v7_pj4b_proc_info, __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> .size __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
> #endif
>
>
> Can someone have a look and tell me that I'm wrong when I read this
> as matching both PJ4 and PJ4B (and PJ4B-MP)?
>
> Either I'm misreading this, or we do the wrong thing in configurations
> that include both PJ4B (berlin, mvebu) and PJ4 (MMP2/dove).
I don't have the relevant documentation to validate it. And I'd prefer
if this was sorted out in a separate patch. Maybe I should just drop
the PJ4 variants from this patch for now.
> > +#ifdef CONFIG_ARM_PATCH_IDIV
> > +
> > +/* "sdiv r0, r0, r1" or "mrc p6, 1, r0, CR0, CR1, 4" if we're on pj4 w/o MP */
> > +static inline u32 __attribute_const__ sdiv_instruction(void)
> > +{
> > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> > + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1);
> > + if (cpu_is_pj4_nomp())
> > + insn = __opcode_thumb32_compose(0xee30, 0x0691);
> > + return __opcode_to_mem_thumb32(insn);
> > + }
>
> Strictly speaking, we can ignore the thumb instruction for pj4, as all
> variants also support the normal T2 opcode for udiv/sdiv.
OK it is gone.
Now... The code bails out if ARM mode and HWCAP_IDIVA is not set so
effectively the ARM mode on PJ4 is currently not used either. So I'm
leaning towards having aving another patch to sort PJ4 out.
> > diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S
> > index af2267f6a5..9397b2e532 100644
> > --- a/arch/arm/lib/lib1funcs.S
> > +++ b/arch/arm/lib/lib1funcs.S
> > @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */
> > .endm
> >
> >
> > +#ifdef CONFIG_ARM_PATCH_IDIV
> > + .align 3
> > +#endif
> > +
> > ENTRY(__udivsi3)
> > ENTRY(__aeabi_uidiv)
> > UNWIND(.fnstart)
>
> I'd probably just leave out the #ifdef and always align this, but it's not
> important.
This serves as "documentation". The alignment is important when
CONFIG_ARM_PATCH_IDIV is selected and not otherwise.
Nicolas
More information about the linux-arm-kernel
mailing list