[PATCH resend] timers: Fix excessive granularity of new timers after a nohz idle

Tue Aug 22 01:06:30 PDT 2017

On Tue, 22 Aug 2017 09:45:46 +0200 (CEST)
Thomas Gleixner <tglx at linutronix.de> wrote:

> On Tue, 22 Aug 2017, Nicholas Piggin wrote:
> > I would have preferred to get comments from the timer maintainers, but
> > they've been busy or away for the past copule of weeks. Perhaps you
> > would consider carrying it until then?  
> 
> Yes, I was on vacation, but that patch never hit my LKML inbox ....

Hmm okay. Well that's fine, we're just getting done testing it here.

> 
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index 8f5d1bf18854..dd7be9fe6839 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -203,6 +203,7 @@ struct timer_base {
> >  	bool			migration_enabled;
> >  	bool			nohz_active;
> >  	bool			is_idle;
> > +	bool			was_idle; /* was it idle since last run/fwded */  
> 
> Please do not use tail comments. They are a horrible habit and disturb the
> reading flow.

Sure.

> I'm not fond of that name either. Something like forward_clk
> or a similar self explaining name would be appreciated.

Well I actually had that initially (must_forward). But on the other
hand that's less explanatory about the state of the timer base I thought.
Anyway I could go either way and you're going to be looking at this code
more than me in future :)

> 
> >  	/*
> > @@ -938,6 +945,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
> >  	 * same array bucket then just return:
> >  	 */
> >  	if (timer_pending(timer)) {
> > +		/*
> > +		 * The downside of this optimization is that it can result in
> > +		 * larger granularity than you would get from adding a new
> > +		 * timer with this expiry. Would a timer flag for networking
> > +		 * be appropriate, then we can try to keep expiry of general
> > +		 * timers within ~1/8th of their interval?  
> 
> This really depends on the timer usage type.
> 
> If you use it merily for TCP timeouts or similar things, then this does not
> matter at all because then the timer is the safety net in case something
> goes wrong. If you lose packets on the transport the larger granularity is
> the least of your worries.
> 
> From earlier discussions with the networking folks these timeouts are the
> reason for this optimization because you avoid the expensive
> dequeue/requeue operation if the successful communication is fast.
> 
> If the timer has a short relative expiry and is used for sending packets or
> similar purposes, then it usually sits in the first bucket and has no
> granularity issues anyway. But from staring at trace data provided by
> google and facebook these timer are not rearmed while pending, they fire,
> do networking stuff and then get rearmed.
> 
> So I rather avoid that kind of misleading comment. The first sentence
> surely can stay.

Right. The question was actually there for you to answer, so thanks. I
can certainly understand the use-case and importance of keeping it light.

> Other than that, nice detective work!

Thanks for taking a look (and thanks everyone who's been testing and
hacking on it). Do you want me to re-send with the changes?

Thanks,
Nick