[PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
johnstul at us.ibm.com
Mon Jan 3 14:47:29 EST 2011
On Mon, 2011-01-03 at 00:37 +0000, Russell King - ARM Linux wrote:
> On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote:
> > On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:
> > > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > > >
> > > > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > > > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > > > so that the given range won't overflow a 64 bit result. The mult factor
> > > > is itself already limited to a maximum of 32 bits. Given that the largest
> > > > clock tick range we may have is also 32 bits, there is no way our usage of
> > > > any computed mult factor would ever overflow 64 bits.
> > > >
> > > > The choice of 60 seconds here for minsec is rather arbitrary. If the
> > > > mult factor wasn't already limited to 32 bits, this value of 60 would
> > > > create overflows for clocks which take more than 60 seconds to wrap.
> > >
> > > 60 seconds was arbitary, chosen from the selection of clock rates which
> > > I had available at the time (the highest of which was 24MHz).
> > Fair enough. It is just not universally optimal given the different
> > clock ranges this code now covers.
> > > > And for clocks which take less than 60 seconds to wrap then we do lose
> > > > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > > > to a range which is larger than what we actually use. This currently
> > > > affects clocks faster than 71MHz.
> > > >
> > > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > > > be computed in all cases. But let's be formal and provide the actual
> > > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > > > users are affected as they all are using clocks < 71MHz.
> > >
> > > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> > > of assuming 5 seconds? It also has access to the mask and rate.
> > There is a comment to that effect right above the
> > clocks_calc_mult_shift() indicating that this is a known issue in that
> > case already.
> > And looking at that code, it appears that there is a balance to be made
> > between cs->max_idle_ns and cs->mult, the former currently being
> > determined by the later. But if cs->mult is maximized to 32 bits, that
> > leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2
> > seconds only.
> > So this is not clear to me what would be the optimal mult/shift values
> > in the __clocksource_updatefreq_scale() context, while this is rather
> > obvious in the init_sched_clock() context. Hence this patch.
> As clocksources are about precision (I believe it has been stated so in
> the past) it seems that fudging the shift and multiplier to get an
> extended period out of the clock is not the correct approach.
> Maybe Thomas or John can shed some light on this?
Right, so there's conflicting goals of long idle times and fine-grained
time accuracy that pull the clocksource mult/shift pair calculation in
Originally, the advice was pick a mult/shift pair that allows for a few
seconds of time to accrue before we overflow. However this was a little
vague and as the number of clocksources grew, and the desired idle times
were increasing, I started to see that the lack of consistency was
going to give us troubles.
So the clocksource_register_hz/khz interfaces try to consolidate that
decision into one place, utilizing the MAX_UPDATE_LENGTH value
(currently 5 seconds) as the max idle time. Clocksource driver writers
don't have to worry about making the correct decision, the kernel should
do it for you.
Now, at some point I expect folks to get grumpy about the max idle time
being limited to 5 seconds, and we'll have to either push it out for
everyone (at the cost of less accurate NTP steering), or make it a
architecture specific config option.
Now, for sched_clock, there are a different set of expectations with
regards to accuracy and expected idle times, and we'll probably need a
similar consolidation effort to make sure the mult/shift calculations
are correct and the resulting limits are taken into account by the
scheduler when going into NOHZ mode.
More information about the linux-arm-kernel