bbt and bitflip

Brian Norris computersforpeace at gmail.com
Thu Jun 23 12:36:32 EDT 2011


Hello,

On Fri, Apr 22, 2011 at 1:15 AM, Artem Bityutskiy <dedekind1 at gmail.com> wrote:
> On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote:
>> Here a quick and dirty patch to make them more robust.
>> Any comment are welcomed.
>
> Would be great if you could test it and submit a nice patch with proper
> commit message and Signed-off-by.

I am interested in Artem's comments on the robustness of flash-based
BBT (here, and more recently on
http://lists.infradead.org/pipermail/linux-mtd/2011-June/036557.html).
I recently have moved to using flash-based BBT (in-band, actually),
and it seemed like several NAND drivers use flash-based BBT as well.
Is it really that un-trustworthy?

So I guess I'm looking for areas of improvement. I see a few suggestions here:

"Also the pattern and version in oob isn't protected by ecc. They can
be corrupted."

I noticed this one recently. My hardware ECC actually *can* do ECC for
what little OOB is actually free, but I realized that the nand_base
code doesn't even try to check the ECC stats (in nand_do_read_oob())
whereas some similar code for nand_do_read_ops *does* check the ECC
stats. Is it fair to adapt the code from nand_do_read_ops to
nand_do_read_oob so that both check for ECC errors, just in case the
hardware supports it? Would this cause any API problems, if users
didn't expect OOB reads to return ECC error statuses? For now, this
would only have any effect if a driver replaces the chip->ecc.read_oob
function with something that performs ECC operations independently and
increments the ECC counters.

And speaking of BBT in OOB:
Anyone know why the flash-based ident pattern and version is
"traditionally" stored in OOB? It was quite recently that Sebastian
Andrzej Siewior added the NAND_USE_FLASH_BBT_NO_OOB flag (which is
slated to be renamed/replaced, FYI). It seems like ...NO_OOB is a
generally good (or at least better) idea. Then we wouldn't even have
to worry much about ECC in OOB.

"read_bbt which ignore ecc bit flip/error"

If I understand right, read_bbt just prints warning message on ECC
flips/errors and otherwise ignores them? Would Matthieu's "quick and
dirty" patch be an OK start for fixing this? (where in the absence of
a valid backup tableb, an ECC error causes a flash rescan and an ECC
bitflip causes an erase/rewrite "scrub"?)

"The bbt should be protected with CRC and if it gets corrupted we
should re-scan the flash and re-create it."

Wouldn't CRC just be a lesser replacement for proper ECC protection?
Or am I missing something?

Brian



More information about the linux-mtd mailing list