[PATCH 4/5] mtd: nand: add support for Micron on-die ECC

Wed Mar 22 07:39:59 PDT 2017

 Hi Boris

>Hi Bean,
>
>On Wed, 22 Mar 2017 13:20:04 +0000
>"Bean Huo (beanhuo)" <beanhuo at micron.com> wrote:
>
>> >+micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct
>> >+nand_chip
>> >*chip,
>> >+                                                         uint8_t
>> >+*buf, int oob_required,
>> >+                                                         int page) {
>> >+             int status;
>> >+             int max_bitflips = 0;
>> >+
>> >+             micron_nand_on_die_ecc_setup(chip, true);
>> >+
>> >+             chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
>> >+             chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
>> >+             status = chip->read_byte(mtd);
>> >+             if (status & NAND_STATUS_FAIL)
>> >+                           mtd->ecc_stats.failed++;
>> >+             /*
>> >+             * The internal ECC doesn't tell us the number of
>> >+bitflips
>> >+             * that have been corrected, but tells us if it
>> >+recommends to
>> >+             * rewrite the block. If it's the case, then we pretend
>> >+we had
>> >+             * a number of bitflips equal to the ECC strength, which
>> >+will
>> >+             * hint the NAND core to rewrite the block.
>> >+             */
>> >+             else if (status & NAND_STATUS_WRITE_RECOMMENDED)
>> >+                           max_bitflips = chip->ecc.strength;
>> >+
>> >+             chip->cmdfunc(mtd, NAND_CMD_READ0, -1, -1);
>> >+
>> >+             nand_read_page_raw(mtd, chip, buf, oob_required, page);
>> >+
>> >+             micron_nand_on_die_ecc_setup(chip, false);
>> >+
>> >+             return max_bitflips;
>> >+}
>>
>>
>> Hi,
>> Let me give you some information, hopefully you can do some modification
>based on above codes.
>>
>> I noticed that this patches are based on MT29F1G08ABADAWP SLC NAND, it is
>our 60s 34nm SLC NAND.
>> So far, we have 2 series SLC NAND with implementations of on die ECC.
>> 1. M79A for all 25nm (70series) SLC NAND with on-die ECC (M78A, M79A,
>> and future design M70A) 2. M60A for all 34nm (60series) SLC NAND with
>> on-die ECC
>
>Do you have an easy way to differentiate those 2 generations of chip, or should
>we base our detection on the model name provided in the ONFI parameter page?
>
Of course, you can use model name, but I think we will keep a big table to include every NAND information.
Also, it doesn't look nice and always changes.

The better solution is:

For the Micron SLC NAND with on Die ECC, please note only for the "SLC NAND with on Die ECC",
You can always differentiate these two generation NAND by ONFI table byte 112 "Number of bits
ECC correctability ", if its value is 4, it is 60s; if it's 8, it is 70s. this is a permanent method for both
60s and 70s "SLC NAND with on Die ECC".

>>
>> NAND_STATUS_FAIL:
>> For the both of series SLC NAND with on-die ECC, SR bit 0
>> (NAND_STATUS_FAIL) indicates an uncorrectable read fail, data is lost,
>> no recovery possible, unless we have software additional protection, the block
>is not necessarily bad but the data is lost.
>>
>> NAND_STATUS_WRITE_RECOMMENDED:
>>
>> For the NAND_STATUS_WRITE_RECOMMENDED, it only works on 60s NAND, it
>> is 4 bit ECC, the status register only indicates if there is 0 or 1-4 correctable
>error bits. We don't want to trigger refresh if only 1 or 2 bits fail.
>> the base refresh is that if there 3 or 4 bitflips. But unfortunately we can't get
>failed bit count trough read status register.
>> SW workaround proposal:
>> 1. If SR bit 3 is set to 1 it means 1~4 bitflips and correctable.
>> 2. Read out the page with ECC ON
>> 3. Read out the page with ECC OFF
>> 4. Compare the data
>> 5. Count the number of bitflips for the sectors (there are 4 ECC
>> sectors) 6. if 3 or more fail bits, trigger fresh.
>> I know this is not good solution, but if as long as
>> NAND_STATUS_WRITE_RECOMMENDED is set, and trigger refresh, this will
>definitely increase NAND PE cycle.
>
>We discussed that with Thomas when developing the solution. I suggested to first
>go for a simple solution even if it implies unneeded PE cycles when bitflips are
>detected, but maybe I was wrong. In any case, it shouldn't be to hard to do what
>you suggest.
>

Ok, but I recommend that 70s should be the first choice on this single solution,
it doesn't need to read twice to detect its bitflips count. 

>>
>> For the 70s, it is 8 bits on-die ECC, the status register can report 7-8 bitflips
>(refresh recommended), 4-6 bitflips and 1-3 bitflips.
>> So we can trigger refresh according to its bitflips status.
>
>That's good news!
>
>Thanks for your feedback.
>
>Boris

//Beanhuo