[PATCH] cfi: Fixup of write errors on XIP

Nicolas Pitre nico at cam.org
Fri Mar 10 11:36:47 EST 2006


I'm back, sorry for the delay.

On Thu, 2 Mar 2006, Alexey, Korolev wrote:

> Nicolas
> 
> 
> > Please don't just yet.
> > 
> > > The scenario of the issue is following:
> > > > 1. do_write_buffer
> > > 2. Waiting for write complete in xip_udelay
> > > 3. System Interrupt
> > > 4. Write suspend
> > > 5. Rescheduling
> > > 6. Block erasing by other process. ( This operation typically took
> > > rather long time )
> > > 7. Complete, rescheduling
> > > 8. Return to write (write is not complete due to suspend ).
> > > 9. Check timeout. Time is up.
> > > 10. Error.
> > 
> > This should not happen.  And if it does then the bug is in xip_udelay()
> > and therefore should be fixed there.
> > 
> > The fact is, xip_udelay() should not return until either the flash
> > status is 0x80 (done) or the delay expired.  The code looks like:
> > 
> This is absolutelly correct.
> But delay may expire sometimes before chip get ready even if chip has not been
> suspended.

So?

> Buffer programming time for chip may vary.
> For example timeout has expired couple usecs before status get ready. (the
> such variations are absolutely ok).

Agreed.

> You go up to do_write_buffer,  and get the described scenario if chips has
> been suspended at the very begging of waiting in xip_uddelay.

I still don't see the problem.

Let's suppose we enter xip_udelay(), and scheduling happens, and another 
or even multiple other threads take their time to erase the flash, say 
for more than 4 seconds.  Currently xip_udelay() remembers how much time 
it had remaining before being suspended and will report that time when 
it is scheduled back.  So xip_udelay() should never return before the 
estimated amount of time required to perform the write as actually been 
spent actively writing.

Do we agree so far?

Now xip_udelay() may return to do_write_buffer() either because the 
write has completed, or the timeout occurred.  Like you said the timeout 
is normal since the write might still have a few microseconds to go.  
But only at that point is the timeo variable initialized with
jiffies + (HZ/2).

Further down, though, is a call to UDELAY() going again into 
xip_udelay() but this time any suspended operation is not accounted for 
in the timeo variable.

But the thing is, if you look at get_chip(), you'll see that nothing can 
go and erase another flash sector when a write is suspended.  In other 
words, completion of a write operation always has priority on any erase 
attempt.  So the problem you're describing may not be due to any erase 
delay occurring in another thread.

The only possibility I can see is that xip_udelay() is interrupted so 
often that the UDELAY(map, chip, cmd_adr, 1) call never gets a chance to 
make any progress, even in the course of a half second wall clock time.

Do you have a high interrupt rate in your system?

If this is not the case that means there is a real bug somewhere and I'd 
prefer that the real bug be found and addressed instead of masking it 
out.


Nicolas




More information about the linux-mtd mailing list