udelay() broken for SMP cores?

Wed Apr 21 16:57:45 EDT 2010

On Wed, Apr 21, 2010 at 09:47:18PM +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > You don't understand the issue.  On older ARMs, the single 32-bit
> > multiply is not cheap; it shows up as having a significant time
> > expense for very short delays - and that _does_ matter.
> > 
> > Consider system performance where you're driving a bus using udelay()
> > to provide 1us timings, but udelay ends up taking 10us instead every
> > time because of the calculation for number of loops for a 1us timing.
> 
> Hence nested loop.  You don't multiply.  No calculation.

Ok, since you seem to have a clear idea how to convert this into a double
nested loop, try converting it:

						@ 0 <= r0 <= 0x7fffff06
                ldr     r2, .LC0 (loops_per_jiffy)
                ldr     r2, [r2]                @ max = 0x01ffffff
                mov     r0, r0, lsr #14         @ max = 0x0001ffff
                mov     r2, r2, lsr #10         @ max = 0x00007fff
                mul     r0, r2, r0              @ max = 2^32-1
                movs    r0, r0, lsr #6
                moveq   pc, lr
1:              subs    r0, r0, #1
                bhi     1b
                mov     pc, lr

into two loops without losing the precision - note that the multiply
is part of a 'dividing by multiply+shift' technique.