CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC
pev at sketchymonkey.com
Tue Feb 15 12:58:17 EST 2011
I did some more Googling and found a couple more interesting articles
; I don't know if you've read them (I assume you probably have!) but
thought I'd repost for anyone else interested as this seems to be a
really fascinating and not often discussed topic! :
I found the following also really interesting, especially for the
analysis of lots of devices with a plot of numbers of initial bad
blocks marked which I'd always wondered about! :
> Either the file system, or some flash management layer such as UBI should
> take care of this, but I know that jffs2 doesn't at any rate. I'd agree it
> makes sense for the lower level to to the best of its capability guarantee
> that the data written actually does get written on the flash.
Well, we're using YAFFS2 - I haven't looked into how it deals with
these scenarios though... I'll see if I can figure out how it behaves.
> I think that the application note in some respects simplifies matters a bit.
> If you have a block that is wearing out, due to a large number of
> erase/write cycles, it will exhibit several failure modes at an increasing
> rate, in particular long-term data loss. At some point one could argue that
> the block is essentially worn out, and mark it as bad, even though an
> erase/write/verify cycle actually might be successful. I don't think that is
> what is happening in your case though.
That's all to do with what one considers a "Bad Block" - I'd agree
that the repeated failures can show that there might be an issue but
all the literature I've read today state that only permanent failures
are regarded as showing a bad block and these are reported via the
flashes status read. In fact I found a Micron Appnote AN1819
> I don't know if anyone else here on this list has any experiences to share.
> Frankly, if I saw errors of that type I would start looking for hardware
> problems, or some sort of hardware or software induced contention on the
> flash chip bus. Not that that would necessarily be the right approach, but
> I've seen errors of that type occurring as the result of out-of-spec level
> shifters between the MCU and flash chip, or incorrectly set up bus timing
> towards the flash.
We're looking into these possibilities as well - However as is often
the case, such problems provoke testing of less used code paths so
it's quite a good thing to look at the right thing to do in this event
in conjunction with fixing the root cause of the problem...
More information about the linux-mtd