Fw: corrupt my NAND flash device

Thomas Gleixner tglx at linutronix.de
Mon Apr 28 18:59:34 EDT 2003


On Monday 28 April 2003 23:14, Charles Manning wrote:
> I have seen some wierd stuff before... comments further below:
> > The whole thing just makes me sick.  It's ugly putting in such a hack.
> > One little voice in my head keeps telling me that there's an error in
> > software and I just have to find and fix the bug.  Another little voice
> > in my head keeps telling me that broken hardware is more common than
> > most people want to believe.
>
> Yes, there are/ have been cases where the chips do not latch their commands
> correctly. This can be made worse by marginal chip select timing etc.

That's nothing, what should be fixed by generic software drivers. Either the 
chips are buggy or the signal timings are wrong or even both. If we would 
take care of all broken hardware, we would experiencing magic kernel source 
size explosion within no time. 

> * Reading the status too soon after issuing the command: some parts need a
> brief wait after latching the command before the busy flag is valid.
> Without the wait, the busy state might be misinterpreted. 500ns would be
> ample.

If this is an issue, I'm willing to add this to nand.c in form of a hardware 
driver supplied delay, which is 0 by default.

> * Ensuring the correct number of address cycles: I have observed cases
> where a chip seems to work when the wrong number of address cycles was
> issued, but gave erratic results.

The address cycles in the generic nand.c command function are correct. I don't 
know, if anybody uses a hardware driver supplied command function.

> * Issue a reset command before any read/write/erase command. This is a
> small overhead and ensures that the command register is always in a
> consistent state.

If that helps, I'm willing to add this too, conditional, defaulting to zero. I 
remember a big thread complainig about this overhead, before it was removed.
I did this carefully and there is no "maybe a write is interrupted by another 
thread issue". Only erases can be interrupted, but they are restarted later. 
And on interruption of erase the reset comand is issued.

Can anybody add a check, whether the erase is interrupted immidiately before 
the write error occures ? If that's the case, then we have to check the 
datasheet of the offending chip and maybe block erase interruption 
conditionally, defaulting to not, as it works here and is proven to do so 
elsewhere.

> Also check the basics like power and signal integrity. Overshooting/ringing
> clocks could very easily be latching spurious data and corrupting the
> commands.

I have seen this on some hardware, where address lines were used for CLE and 
ALE, which is possible with compliance to all timing constraints. But it's 
really not easy to match this under all circumstances (interrupts, dma, cache 
refill ....).

> > I haven't been very aggressive about adding the retry code because right
> > now I'm interested in more data points: Am I the only one that sees the
> > problem of a flash chip that occasionally drops commands or are others
> > seeing this same problem?  Is this problem more common but people don't
> > see it because the flash filesystems think that a location is bad and
> > mark it as unusable?
>
> I'd suggest exploring the above first.

I have running NAND-FLASH with YAFFS and JFFS2 partitions for more than a year 
in a mostly permanent copy/remove/move cycle. I had no spurious commands or 
anything like that. I never got blocks marked bad randomly. I have different 
sized SmartMedia Cards from various vendors and production dates in use, so 
it is not a random good part luck.

I know about a bunch of implementations, where NAND has been proven reliable 
in extensive tests. 

I'm really _NOT_ willing to buy, that adding of some obscure retry mechanism 
will solve all this problems for ever. They may dissapear for now and come 
back in a different EMC or application environement.

-- 
Thomas
________________________________________________________________________
linutronix - competence in embedded & realtime linux
http://www.linutronix.de
mail: tglx at linutronix.de




More information about the linux-mtd mailing list