[PATCH 0/2] use hrtimer in nand_wait

Tue May 22 04:37:47 EDT 2012

> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind1 at gmail.com]
> Sent: den 22 maj 2012 09:23
> To: Johan Gunnarsson
> Cc: linux-mtd at lists.infradead.org; Jesper Nilsson
> Subject: Re: [PATCH 0/2] use hrtimer in nand_wait
> 
> On Mon, 2012-05-21 at 10:42 +0200, Johan Gunnarsson wrote:
> > I've narrowed it down to the nand_wait routine and its dependency on
> a
> > reliable jiffies counter. Sadly, jiffies is not reliable when
> handling
> > of timer interrupts are delayed or even completely discarded. If
> > interrupts are disabled for, say, 3 timer periods, jiffies will stop
> > counting during this time and have a very fast increment by 3 when
> > interrupts are later enabled.
> 
> I can follow up to this point.
> 
> >  This combined with unfortunate timing can cause the timeout loop
> > think a 20ms timeout is happening when just <0.1ms has passed in wall
> > clock time.
> 
> What is you HZ? Let's say it is 100, then jiffie increments every 10ms,
> right? How can it increment by 2 after 0.1 ms?

It is supposed to increment by 1 each clock tick. But if interrupts are disabled for a number of clock ticks, all these increments will be lumped together and incremented in a loop when interrupts are enabled again. Sort of to catch up.

The relevant parts of the kernel where this happens is

kernel/time/tick_common.c:tick_handle_periodic()
kernel/time/tick_common.c:tick_periodic()
kernel/time/clockevents.c:clockevents_program_event()

Timer interrupt handler calls tick_handle_periodic through the clockevents framework. The loop in tick_handle_periodic will iterate one extra time for each missed tick. clockevents_program_event knows if the next timer to be programmed is in the past by reading back the current HW timer value through the clocksource framework. The actual jiffies increment happens in do_timer through tick_periodic.

Also, please note that we're using a one-shot time that is reprogrammed for each tick. Not a periodic timer.

> >
> > To illustrate the jiffies/interrupt-relationship:
> >
> > Interrupts: |      |      |                    |      |      |      |
> > Jiffies:    |      |      |                    |||    |      |      |
> >
> > This obviously only happen on multi-core CPUs, where the write and
> > interrupts are executed by different cores simultaneously. Switching
> > to hrtimer-based timeout solves this problem for me. I found a second
> > (less serious) issue which included in the first patch.
> 
> Sorry, I do not understand why this happens only on SMP. Could you
> please explain some more?

The only way this can happen is by executing a nand_write at the same time as the jiffy counter is "catching up" (ie. running the loop in tick_handle_periodic as described above.) On a single-core system I don't think that would be possible since the catch-up happens in interrupt context.

> 
> I do not disagree that we should stop using jiffies, if we can, I just
> want to understand what is happening.
> 
> --
> Best Regards,
> Artem Bityutskiy