ARM11 MPCore: Adding nop to __delay() doubles the BogoMIPS/lpj
Shilimkar, Santosh
santosh.shilimkar at ti.com
Fri Jan 29 00:08:06 EST 2010
> -----Original Message-----
> From: linux-arm-kernel-bounces at lists.infradead.org [mailto:linux-arm-kernel-
> bounces at lists.infradead.org] On Behalf Of Catalin Marinas
> Sent: Thursday, January 28, 2010 6:33 PM
> To: Dirk Behme
> Cc: linux-arm-kernel at lists.infradead.org
> Subject: Re: ARM11 MPCore: Adding nop to __delay() doubles the BogoMIPS/lpj
>
> On Wed, 2010-01-27 at 16:45 +0000, Dirk Behme wrote:
> > On a 400MHz ARM11 MPCore system (NEC NaviEngine based) with kernel
> > 2.6.32 we found that BogoMIPS/loops per jiffies ~doubles (see below
> > [1]) by adding a nop to __delay():
> >
> > --- a/arch/arm/lib/delay.S
> > +++ b/arch/arm/lib/delay.S
> > @@ -41,6 +41,9 @@ ENTRY(__const_udelay) @ 0 <= r0 <= 0x
> > @ Delay routine
> > ENTRY(__delay)
> > +#if defined(CONFIG_CPU_V6) && defined(CONFIG_SMP)
> > + nop
> > +#endif
> > subs r0, r0, #1
> > #if 0
> > movls pc, lr
> >
> > Any ideas what might happen here?
>
> Branch (mis-)prediction? Alignment?
>
> It doesn't really matter, bogomips should not be used as some form of
> performance checking.
>
> BTW, local timers give a more accurate estimate of the CPU frequency
> (they are counting at half this frequency).
Last time I was experimenting with this, the data I got from for A9 was " the
loop prediction" makes this faster on the hw support fast loop mode .
It is a feature of the C-A9 pipeline that enables it to spot short loops like
"BHI {pc}-4 ; 0x100 **" nd just issue store them in the pipiline queue
to be dispatched from there rather than being fetched from the Icache all
the time.
Regards,
Santosh
More information about the linux-arm-kernel
mailing list