[PATCH v1] mtd: gpmi: Bitflip support in erased regions
Huang Shijie
shijie8 at gmail.com
Wed Dec 11 08:24:58 EST 2013
On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
> Fixed cc to linux-mtd, please ignore my previous version.
>
> Hello all,
>
> I bumped into an issue on a custom board with an i.MX28 and a Micron
> MT29F4G08 NAND flash. My system running a 3.9.0 failed to boot during
> upgrade testing due to UBI errors related to a bitflips in NAND:
>
> [ 3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [ 3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
> ...
> [ 4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
> [ 4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
> [ 4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383
thanks a lot for this patch.
I met the "corrupt empty space" issue too.
>
> Diving a bit deeper with nanddump:
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 8
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
> ECC: 1 uncorrectable bitflip(s) at offset 0x06efe000
> root@(none):~# nanddump -s 116129792 -c --noecc -l 262144 /dev/mtd8
> ...
> 0x06efe6a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7f |................|
>
> Which is points to a well know 'corrupt empty space' issue, which appears
> every now and then:
> - http://permalink.gmane.org/gmane.linux.drivers.mtd/46617
> - http://lists.infradead.org/pipermail/linux-mtd/2012-January/039254.html
>
> Hence I went on a quest to teach my NAND driver how to do this, gpmi-nand in
> question. The problem is that although on properly written data which gets
> streamed through the BCH block we get 16 bit ecc, if we erase block we git
> like 0 bit ecc, since erase is a command, not a stream of data travelling
> through the BCH block. The BCH block (see i.MX28 reference manual chapters
> 15 GPMI and 16 BCH) can tell us of protected chunks:
> - if they are error free (if ecc data is present)
> - the amount of bitflips they contain (if ecc data is present)
> - if they are fully erased (all 0xFF's)
> - if they are uncorrectable (# bitflips > ecc_strength, or 0xFF with
> bitflips).
> In the current situation as soon as a single bitflip exists in a region
> where the parity information is all 0xFF (looking like it's erased) the
> block is marked as uncorrectable. Which is a pity since I can peform this
> kind of ECC by hand.
>
> Quote datasheet:
> "As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
> that all of the bits of a payload data block or metadata block are one, including any associated
> parity bytes. The all-ones case for both parity and data indicates an erased block in the
> NAND device."
>
> Fortunately we can more or less tune this parameter by using the
> ERASE_THRESHOLD in HW_BCH_MODE register:
> "This value indicates the maximum number of zero bits on a flash page for
> it to be considered erased. For SLC NAND devices, this value should be
I met the "correct empty space" with a Toshiba SLC nand.
The spec tells us it should be 0 for the SLC nand.
I will double-check it tomorrow.
> programmed to 0 (meaning that the entire page should consist of bytes of
> 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank
> pages), so this threshold can be used to tune the erased page checking
> algorithm."
>
> So as my solution I'm setting this erase threshold to the ecc_strength
> derived from the geometry, meaning that I will tolerate the same number of
> bitflips the BCH block would consider correctable.
> The side effect is that whever I'm reading a page (gpmi_ecc_read_page() )
> which the BCH block marked as "erased" I need to take a software approach.
> The software approach is inspired on what is currently
> done in the omap2 driver (but not free from discussion). At that point I
> now that the page can contain up to ecc_strenght bitflips, so I need to
The ecc_strength can be 40 sometimes.
I really donot know what is the proper value for the ERASE_THRESHOLD.
Maybe set ERASE_THRESHOLD with 2 is ok?
I think the ecc_strength is a little large.
> count and correct them if necessary. This obviously gives a slight overhead
> when compared to a normal read of erased pages but is more polite towards
> upper layers.
> On the other hand, the upper layers should also show some intelligence when
> it comes to reading erased pages which doesn't make much sense either.
>
> I considered alternatives based upon the 'let it fails as it does now, and
> try to intelligently figure out whether or not it's an erased page or not'
> possibly using additional byte in the metadata or something based
> on fuzzy rules, but this is actually the solution which ended up giving
> most certainty.
>
> I have tested this on a 3.9/i.MX28 and after applying this patch my board
> went from a stubbornly-whining-about-corrupt-empty-space to happily
> mounting the partition and even the trace of my stuck bit disappeared:
>
> root@(none):~# nanddump -a /dev/mtd8 > /dev/null
> ECC failed: 0
> ECC corrected: 1
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
>
>
> I have also seen Pekon is eagerly trying to get the code removed from omap2,
> (e.g. http://lists.infradead.org/pipermail/linux-mtd/2013-July/047548.html )
> but even though his set of patches is currently in their 4th version I
> haven't seen any proper solution to handling bitflips in erased pages
> without iterating through them.
>
I will read it.
Please give us more time about this issue.
I will discuss it with out IC guy.
thanks
Huang Shijie
More information about the linux-mtd
mailing list