[PATCH] optimize ktime_divns for constant divisors
Nicolas Pitre
nicolas.pitre at linaro.org
Wed Dec 3 23:23:37 PST 2014
On Wed, 3 Dec 2014, Arnd Bergmann wrote:
> On Wednesday 03 December 2014 14:43:06 Nicolas Pitre wrote:
> > At least on ARM, do_div() is optimized to turn constant divisors into
> > an inline multiplication by the reciprocal value at compile time.
> > However this optimization is missed entirely whenever ktime_divns() is
> > used and the slow out-of-line division code is used all the time.
> >
> > Let ktime_divns() use do_div() inline whenever the divisor is constant
> > and small enough. This will make things like ktime_to_us() and
> > ktime_to_ms() much faster.
> >
> > Signed-off-by: Nicolas Pitre <nico at linaro.org>
>
> Very cool. I've been thinking about doing something similar for the
> general case but couldn't get the math to work.
>
> Can you think of an architecture-independent way to ktime_to_sec,
> ktime_to_ms, and ktime_to_us efficiently based on what you did for
> the ARM do_div implementation?
Sure. gcc generates rather shitty code on ARM compared to the output
from my do_div() implementation. But here it is:
u64 ktime_to_us(ktime_t kt)
{
u64 ns = ktime_to_ns(kt);
u32 x_lo, x_hi, y_lo, y_hi;
u64 res, carry;
x_hi = ns >> 32;
x_lo = ns;
y_hi = 0x83126e97;
y_lo = 0x8d4fdf3b;
res = (u64)x_lo * y_lo;
carry = (u64)(u32)res + y_lo;
res = (res >> 32) + (carry >> 32);
res += (u64)x_lo * y_hi;
carry = (u64)(u32)res + (u64)x_hi * y_lo;
res = (res >> 32) + (carry >> 32);
res += (u64)x_hi * y_hi;
return res >> 9;
}
For ktime_to_ms() the constants would be as follows:
y_hi = 0x8637bd05;
y_lo = 0xaf6c69b5;
final shift = 19
For ktime_to_sec() that would be:
y_hi = 0x89705f41;
y_lo = 0x36b4a597;
final shift = 29
Nicolas
More information about the linux-arm-kernel
mailing list