NAND BBT corruption on MPC83xx
Matthew L. Creech
mlcreech at gmail.com
Tue Jul 5 15:58:46 EDT 2011
On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood at freescale.com> wrote:
>
> It seems that the generic code always passes -1 with PAGEPROG, and only
> provides the actual page address on SEQIN.
>
> I don't think the ECC readback is needed, and the fact that it looks like
> it has always been broken would seem to confirm that. It's broken in
> other ways, too -- it assumes a particular ECC layout. Let's get rid of it.
>
> As for the corruption, could it be degradation from repeated reads of that
> one page?
>
I modified nanddump to do repeated reads, and compare the data
obtained from the first iteration with that obtained later (to detect
bit-flips). I tried 3 different variations:
- one which reads the first page (2k) of the last block
- one which reads the second page (2k) of the last block
- one which reads the entire last block (128k), just for comparison
As I understand it, read-disturb would primarily come into play when
the second page is read, since it's adjacent to the first page (please
correct me if I'm wrong there). Anyway, all 3 of these tests were run
for at least 50 million read cycles, with no bit-flips detected. So
I'm somewhat doubtful that this is the cause of the BBT corruption
I've been seeing.
====
Separately, I set up 2 test devices to run while I was away last week.
One of them contained 2 patches:
- Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c
- Adam Thomson's patch
(http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
which initializes oob_poi correctly
Upon my return, the device with these patches saw no problems at all,
and had no additional bad blocks. The device without these patches
had some 200+ blocks which had been newly marked as bad in the BBT
over the course of 10 days. After rebooting, this latter device then
failed to boot, as shown here:
http://mcreech.com/work/bbt-ecc-error4.txt
I'm currently running another test to verify which of the two patches
actually fixed this problem (which might take a few days), but it
seems like removing that block of code in fsl_elbc_nand.c is a good
idea.
--
Matthew L. Creech
More information about the linux-mtd
mailing list