[PATCH v4 5/5] mtd: nand: Improve bitflip detection for on-die ECC scheme.
Gupta, Pekon
pekon at ti.com
Mon Mar 31 23:29:52 PDT 2014
Hi David,
>From: David Mosberger
>
>When NAND_STATUS_REWRITE is set, there is no direct indication as to
>how many bits have were flipped. This patch improves the existing
>code by manually counting the number of bitflips, rather than just
>assuming the worst-case of mtd->bitflip_threshold. This avoids
>unnecessary rewrites, e.g., due to a single stuck bit.
>
>Signed-off-by: David Mosberger <davidm at egauge.net>
>---
> drivers/mtd/nand/nand_base.c | 90 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 86 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>index 834352a..80cfaa8 100644
>--- a/drivers/mtd/nand/nand_base.c
>+++ b/drivers/mtd/nand/nand_base.c
>@@ -1272,6 +1272,84 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
> return max_bitflips;
> }
>
>+static int
>+set_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip, int on)
>+{
>+ u8 data[ONFI_SUBFEATURE_PARAM_LEN] = { 0, };
>+
>+ if (chip->ecc.mode != NAND_ECC_HW_ON_DIE)
>+ return 0;
>+
>+ if (on)
>+ data[0] = ONFI_FEATURE_ARRAY_OP_MODE_ENABLE_ON_DIE_ECC;
>+
>+ return chip->onfi_set_features(mtd, chip,
>+ ONFI_FEATURE_ADDR_ARRAY_OP_MODE, data);
>+}
>+
>+/*
>+ * Return the number of bits that differ between buffers SRC1 and
>+ * SRC2, both of which are LEN bytes long.
>+ *
>+ * This code could be optimized for, but it only gets called on pages
>+ * with bitflips and compared to the cost of migrating an eraseblock,
>+ * the execution time here is trivial...
>+ */
>+static int
>+bitdiff(const void *s1, const void *s2, size_t len)
>+{
>+ const uint8_t *src1 = s1, *src2 = s2;
>+ int count = 0, i;
>+
>+ for (i = 0; i < len; ++i)
>+ count += hweight8(*src1++ ^ *src2++);
>+ return count;
hweight8() might not be good option, how about using hweight32 ?
>+}
>+
>+static int
>+check_for_bitflips(struct mtd_info *mtd, struct nand_chip *chip, int page)
>+{
>+ int flips = 0, max_bitflips = 0, i, j, read_size;
>+ uint8_t *chkbuf, *rawbuf, *chkoob, *rawoob;
>+ uint32_t *eccpos;
>+
>+ chkbuf = chip->buffers->chkbuf;
>+ rawbuf = chip->buffers->rawbuf;
>+ read_size = mtd->writesize + mtd->oobsize;
>+
>+ /* Read entire page w/OOB area with on-die ECC on: */
>+ chip->read_buf(mtd, chkbuf, read_size);
>+
>+ /* Re-read page with on-die ECC off: */
>+ set_on_die_ecc(mtd, chip, 0);
>+ chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
>+ chip->read_buf(mtd, rawbuf, read_size);
>+ set_on_die_ecc(mtd, chip, 1);
>+
>+ chkoob = chkbuf + mtd->writesize;
>+ rawoob = rawbuf + mtd->writesize;
>+ eccpos = chip->ecc.layout->eccpos;
>+ for (i = 0; i < chip->ecc.steps; ++i) {
>+ /* Count bit flips in the actual data area: */
>+ flips = bitdiff(chkbuf, rawbuf, chip->ecc.size);
Do you really need this as separate function call ?
You could make it inline, like you did below of OOB.
Also, you may stop checking bitflips, once count > chip->ecc.strength.
Some of the discussion on different patch may be helpful here, please see
Brian's comments on [1].
>+ /* Count bit flips in the ECC bytes: */
>+ for (j = 0; j < chip->ecc.bytes; ++j) {
Again, why checking only for ecc.bytes ? should have been.
>+ flips += hweight8(chkoob[*eccpos] ^ rawoob[*eccpos]);
>+ ++eccpos;
>+ }
>+ if (flips > 0)
>+ mtd->ecc_stats.corrected += flips;
>+ max_bitflips = max_t(int, max_bitflips, flips);
>+ chkbuf += chip->ecc.size;
>+ rawbuf += chip->ecc.size;
>+ }
>+
>+ /* Re-issue the READ command for the actual data read that follows. */
>+ chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
>+
If you refer my previous feedbacks, you could avoided this extra
"chip->read_buf(mtd, chkbuf, read_size);" by calling it in nand_page_read_on_die();
>+ return max_bitflips;
>+}
>+
> static int check_read_status_on_die(struct mtd_info *mtd,
> struct nand_chip *chip, int page)
> {
>@@ -1290,11 +1368,15 @@ static int check_read_status_on_die(struct mtd_info *mtd,
> mtd->ecc_stats.failed++;
> } else if (status & NAND_STATUS_REWRITE) {
> /*
>- * Simple but suboptimal: any page with a single stuck
>- * bit will be unusable since it'll be rewritten on
>- * each read...
>+ * The Micron chips turn on the REWRITE status bit for
>+ * ANY bit flips. Some pages have stuck bits, so we
>+ * don't want to migrate a block just because of
>+ * single bit errors because otherwise, that block
>+ * would effectively become unusable. So, work out in
>+ * software what the max number of flipped bits is for
>+ * all subpages in a page:
> */
>- max_bitflips = mtd->bitflip_threshold;
>+ max_bitflips = check_for_bitflips(mtd, chip, page);
We usually don't delete the changes, introduced in previous patches
of same series. So, take one approach and then logically divide the patches.
> }
> return max_bitflips;
> }
>--
>1.7.9.5
>
Again, please review all the comments given by Gerhard and myself
on previous patches. Don't hurry in sending newer revisions, as it
would just increase your iterations.
[1] http://lists.infradead.org/pipermail/linux-mtd/2014-March/052625.html
with regards, pekon
More information about the linux-mtd
mailing list