CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC
Ricard Wanderlof
ricard.wanderlof at axis.com
Tue Feb 15 10:01:58 EST 2011
On Tue, 15 Feb 2011, David Peverley wrote:
> Thats a good analysis ; I've dug around as a result and found an
> interesting tech note from Micron at :
> http://download.micron.com/pdf/technotes/nand/tn2917.pdf
Yes, that's a good (and one of the few) notes on failure mechanisms.
> They classify NAND failures into various types ; Permanent Failures
> and Temporary Failures, with the temporary failures being split into
> "Program Disturb", "Read disturb", "Over-programming" and "Data loss"
> ...
> From the description of read disturb, it occurs due to many reads
> (hundreds of thousands or millions) prior to an erase. Currently my
> testing is using nandtestc.c and mtd_stresstest.ko - the former tests
> one cycle before re-programming and the latter is random but not
> expected to be more than tens of reads before a re-programme becomes
> statistically likely.
I agree, it doesn't sound like that is your problem.
> Potentially program disturb sounds like it _could_ be the behaviour I
> observe but it's not clear.
It seems to fit in with the description, i.e. you get bits programmed that
were not intended to be programmed. It seems to get worse when not
programming whole pages at once (partial page programming).
> My general take on this is that only the permanent type failures i.e.
> those involving permanently stuck bits, require marking as bad blocks.
> The recovery recommended for the other scenarios is always to erase and
> re-programme. This potentially opens up a whole can of worms... My
> interpretation of this is that if we verify a write and we've had a
> (correctable and non-permanent) single bit error the Right Thing To Do
> would be to erase and re-programme the block, probably with a very small
> retry limit. We could argue that it's the responsibility of the
> file-system to do this but programatically I think nand_write_page() is
> best placed to be able to do this.
Either the file system, or some flash management layer such as UBI should
take care of this, but I know that jffs2 doesn't at any rate. I'd agree it
makes sense for the lower level to to the best of its capability guarantee
that the data written actually does get written on the flash.
I think that the application note in some respects simplifies matters a
bit. If you have a block that is wearing out, due to a large number of
erase/write cycles, it will exhibit several failure modes at an increasing
rate, in particular long-term data loss. At some point one could argue
that the block is essentially worn out, and mark it as bad, even though an
erase/write/verify cycle actually might be successful. I don't think that
is what is happening in your case though.
> Certainly the verify failures we see here with a raw read are
> occasional (and not consistently the same blocks) and hence not
> indicative of stuck bits and generally after the block is re-written
> the read is correct. What do you reckon?
It would seem that if you have only occasional faults in arbitrary blocks
it wouldn't be a wear problem; if the blame is with the flash I would
agree it fits in with the 'Program Disturb' description. I must admit I've
not come across this type of error myself, but that could be because of
limited experience or that it occurs extremely infrequently in the types
of flash that I've been exposed to so I've never noticed it.
I don't know if anyone else here on this list has any experiences to
share. Frankly, if I saw errors of that type I would start looking for
hardware problems, or some sort of hardware or software induced contention
on the flash chip bus. Not that that would necessarily be the right
approach, but I've seen errors of that type occurring as the result of
out-of-spec level shifters between the MCU and flash chip, or incorrectly
set up bus timing towards the flash.
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
More information about the linux-mtd
mailing list