[PATCH v6 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes

Fri Jan 3 21:48:16 EST 2014

chip->ecc.correct() is used for detecting and correcting bit-flips during read
operations. In OMAP NAND driver different ecc-schemes have different callbacks:
 - omap_correct_data()		for HAM1_HW ecc-schemes (Untouched)
 - nand_bch_correct_data()	for BCHx_HW_DETECTION_SW ecc-schemes (Untouched)
 - omap_elm_correct_data()	for BCHx_HW ecc-schemes (updated)

This patch solves following problems in ECC correction for BCHx_HW ecc-schemes.
Problem: Current implementation depends on a specific byte-position (reserved
         as 0x00) in ecc-layout to differentiate between programmed-pages v/s
         erased-pages.
      1) All ecc-scheme layouts do not have such Reserved byte marker to
         differentiate between erased-page v/s programmed-page. Thus this is a
         customized solution.
      2) Reserved byte can itself be subjected to bit-flips causing erased-page
         to be misunderstood as programmed-page.

Solution: This patch removes dependency on single byte-position ini ecc-layout
         to differentiating between erased-page v/s programeed-page.
         This patch 'assumes' any page to be 'erased':
		(a) if        all(read_ecc)  == 0xff
		(b) else if   all(read_data) == 0xff

Reasons for (a)
      -  An abrupt termination of page programming (like power failure)
         may result in partial write, leaving page in corrupted state with
         un-stable bits. As OOB region is programmed after the data-region,
         so if read_ecc[] == 0xff, then a page should treadted as erased.

      -  Also, as ECC is not present, any bitflips in page cannot be detected.

Reasons for (b)
      - Due to architecture of NAND cell, bit-flips cannot change programmed
        value from '0' -> '1'. So if all(read_data) == 0xff then its confirmed
        that there is 'no bit-flips in data-region'. Hence, read_data[] == 0xff
        can be safely returned.

      - if page was programmed-page with 0xff and 'calc_ecc[] != 0x00' then
        it means that page has bit-flips in OOB-region.

      - if page was erased-page and 'read_ecc[] != ecc_of_all_0xff' then
        it  mean that there are bit-flips in OOB-region of page.

Signed-off-by: Pekon Gupta <pekon at ti.com>
---
 drivers/mtd/nand/omap2.c | 69 ++++++++++++++++++++++++------------------------
 1 file changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 5a6ee6b..589db4c 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1296,24 +1296,10 @@ static int omap3_calculate_ecc_bch(struct mtd_info *mtd, const u_char *dat,
  * @mtd:	MTD device structure
  * @data:	page data
  * @read_ecc:	ecc read from nand flash
- * @calc_ecc:	ecc read from HW ECC registers
- *
- * Calculated ecc vector reported as zero in case of non-error pages.
- * In case of error/erased pages non-zero error vector is reported.
- * In case of non-zero ecc vector, check read_ecc at fixed offset
- * (x = 13/7 in case of BCH8/4 == 0) to find page programmed or not.
- * To handle bit flips in this data, count the number of 0's in
- * read_ecc[x] and check if it greater than 4. If it is less, it is
- * programmed page, else erased page.
- *
- * 1. If page is erased, check with standard ecc vector (ecc vector
- * for erased page to find any bit flip). If check fails, bit flip
- * is present in erased page. Count the bit flips in erased page and
- * if it falls under correctable level, report page with 0xFF and
- * update the correctable bit information.
- * 2. If error is reported on programmed page, update elm error
- * vector and correct the page with ELM error correction routine.
- *
+ * @calc_ecc:	ecc calculated after reading Data and OOB regions from flash
+ *		calc_ecc would be non-zero only in following cases:
+ *		- bit-flips in data or oob region
+ *		- erased page, where no ECC is written in OOB area
  */
 static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				u_char *read_ecc, u_char *calc_ecc)
@@ -1325,6 +1311,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	int eccsize	= info->nand.ecc.size;
 	int eccstrength	= info->nand.ecc.strength;
 	int eccsteps	= info->nand.ecc.steps;
+	bool page_is_erased;
+	u8 *buf;
 	int i , j, stat = 0;
 	int eccflag, actual_eccbytes;
 	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
@@ -1371,24 +1359,35 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		}
 
 		if (eccflag == 1) {
-			/*
-			 * Set threshold to minimum of 4, half of ecc.strength/2
-			 * to allow max bit flip in byte to 4
-			 */
-			unsigned int threshold = min_t(unsigned int, 4,
-					info->nand.ecc.strength / 2);
+			/* (a) page can be 'assumed' erased if
+			 * all(read_ecc) == 0xff */
+			page_is_erased = true;
+			for (j = 0; j < (eccbytes - 1); j++) {
+				if (read_ecc[j] != 0xff) {
+					page_is_erased = false;
+					break;
+				}
+			}
 
-			/*
-			 * Check data area is programmed by counting
-			 * number of 0's at fixed offset in spare area.
-			 * Checking count of 0's against threshold.
-			 * In case programmed page expects at least threshold
-			 * zeros in byte.
-			 * If zeros are less than threshold for programmed page/
-			 * zeros are more than threshold erased page, either
-			 * case page reported as uncorrectable.
-			 */
-			if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
+			/* (b) Due to architecture of NAND cell, bit-flip cannot
+			 * change cell-value from '0' -> '1'. So if page has
+			 * all(read_data) == 0xff, then its confirmed that
+			 * there are no bit-flips in its data-region. Hence,
+			 * read_data == 0xff can be safely returned. */
+			if (!page_is_erased) {
+				page_is_erased = true;
+				buf = &data[eccsize * i];
+				for (j = 0; j < (eccsize - 1); j++) {
+					if (buf[j] != 0xff) {
+						page_is_erased = false;
+						break;
+					}
+				}
+			}
+
+			/* erased-page needs to be handled separately, as ELM
+			 * engine cannot parse pages with all(ECC) == 0xff */
+			if (!page_is_erased) {
 				/*
 				 * Update elm error vector as
 				 * data area is programmed
-- 
1.8.1