[PATCH] cfi: Fixup of write errors on XIP
Nicolas Pitre
nico at cam.org
Fri Mar 10 11:36:47 EST 2006
I'm back, sorry for the delay.
On Thu, 2 Mar 2006, Alexey, Korolev wrote:
> Nicolas
>
>
> > Please don't just yet.
> >
> > > The scenario of the issue is following:
> > > > 1. do_write_buffer
> > > 2. Waiting for write complete in xip_udelay
> > > 3. System Interrupt
> > > 4. Write suspend
> > > 5. Rescheduling
> > > 6. Block erasing by other process. ( This operation typically took
> > > rather long time )
> > > 7. Complete, rescheduling
> > > 8. Return to write (write is not complete due to suspend ).
> > > 9. Check timeout. Time is up.
> > > 10. Error.
> >
> > This should not happen. And if it does then the bug is in xip_udelay()
> > and therefore should be fixed there.
> >
> > The fact is, xip_udelay() should not return until either the flash
> > status is 0x80 (done) or the delay expired. The code looks like:
> >
> This is absolutelly correct.
> But delay may expire sometimes before chip get ready even if chip has not been
> suspended.
So?
> Buffer programming time for chip may vary.
> For example timeout has expired couple usecs before status get ready. (the
> such variations are absolutely ok).
Agreed.
> You go up to do_write_buffer, and get the described scenario if chips has
> been suspended at the very begging of waiting in xip_uddelay.
I still don't see the problem.
Let's suppose we enter xip_udelay(), and scheduling happens, and another
or even multiple other threads take their time to erase the flash, say
for more than 4 seconds. Currently xip_udelay() remembers how much time
it had remaining before being suspended and will report that time when
it is scheduled back. So xip_udelay() should never return before the
estimated amount of time required to perform the write as actually been
spent actively writing.
Do we agree so far?
Now xip_udelay() may return to do_write_buffer() either because the
write has completed, or the timeout occurred. Like you said the timeout
is normal since the write might still have a few microseconds to go.
But only at that point is the timeo variable initialized with
jiffies + (HZ/2).
Further down, though, is a call to UDELAY() going again into
xip_udelay() but this time any suspended operation is not accounted for
in the timeo variable.
But the thing is, if you look at get_chip(), you'll see that nothing can
go and erase another flash sector when a write is suspended. In other
words, completion of a write operation always has priority on any erase
attempt. So the problem you're describing may not be due to any erase
delay occurring in another thread.
The only possibility I can see is that xip_udelay() is interrupted so
often that the UDELAY(map, chip, cmd_adr, 1) call never gets a chance to
make any progress, even in the course of a half second wall clock time.
Do you have a high interrupt rate in your system?
If this is not the case that means there is a real bug somewhere and I'd
prefer that the real bug be found and addressed instead of masking it
out.
Nicolas
More information about the linux-mtd
mailing list