[PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes

Brian Norris computersforpeace at gmail.com
Tue Feb 11 16:34:57 EST 2014


Hi Pekon,

On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> As erased-pages do not have ECC stored in their OOB area, so they need to be
> seperated out from programmed-pages, before doing BCH ECC correction.
> 
> In current implementation of omap_elm_correct_data() which does ECC correction
> for BCHx ECC schemes, this erased-pages are detected based on specific marker
> byte (reserved as 0x00) in ecc-layout.
> However, this approach has some limitation like;
>  1) All ecc-scheme layouts do not have such Reserved byte marker to
>     differentiate between erased-page v/s programmed-page. Thus this is a
>     customized solution.
>  2) Reserved marker byte can itself be subjected to bit-flips causing
>     erased-page to be misunderstood as programmed-page.
> 
> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> between erased-page v/s programeed-page. Instead a page is considered erased if
>  (a) all(OOB)  == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
>  (b) number of zero-bits in (Data + OOB) is less than ecc-strength
> 
> This patch also adds a generic function count_zero_bits(), to find number of
> bits which are '0' in given buffer. This function is optimized for comparing
> read-data with 0xff.
> 
> Signed-off-by: Pekon Gupta <pekon at ti.com>
> ---
>  drivers/mtd/nand/omap2.c | 88 ++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 71 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
> index 2c73389..f833fbb 100644
> --- a/drivers/mtd/nand/omap2.c
> +++ b/drivers/mtd/nand/omap2.c
> @@ -1331,6 +1331,30 @@ static int erased_sector_bitflips(u_char *data, u_char *oob,
>  }
>  
>  /**
> + * count_zero_bits - count number of bit-flips in given buffer
> + * @buf:	pointer to buffer
> + * @length:	buffer length
> + * @max_bitflips: maximum number of correctable bit-flips (ecc.strength)
> + * $bitflip_count: pointer to store bit-flip count
> + *
> + * Counts number of bit-flips in given buffer, returns with error
> + * as soon as count exceeds max_bitflips limit.
> + */
> +static int count_zero_bits(u_char *buf, int length,
> +					 int max_bitflips, int *bitflip_count)
> +{	int i;
> +
> +	for (i = 0; i < length; i++) {
> +		if (unlikely(buf[i] != 0xff)) {
> +			*bitflip_count += hweight8(~buf[i]);
> +			if (unlikely(*bitflip_count > max_bitflips))
> +				return -EBADMSG;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
>   * omap_elm_correct_data - corrects page data area in case error reported
>   * @mtd:	MTD device structure
>   * @data:	page data
> @@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>  {
>  	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
>  			mtd);
> -	int eccbytes	= info->nand.ecc.bytes;
> -	int eccsteps = info->nand.ecc.steps;
> +	struct nand_ecc_ctrl *ecc = &info->nand.ecc;
> +	int eccstrength		= ecc->strength;
> +	int eccsize		= ecc->size;
> +	int eccbytes		= ecc->bytes;
> +	int eccsteps		= ecc->steps;

When I recommended adding the 'ecc' variable (as you did), I was
suggesting you drop the next 4 variables. You don't need them when you
can (with just as few characters) refer to ecc->field directly and
easily. Stick with one or ther other -- the 4 local variables or the 1
ecc pointer -- but you don't need both.

>  	int i , j, stat = 0;
>  	int eccflag, actual_eccbytes;
>  	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
>  	u_char *ecc_vec = calc_ecc;
>  	u_char *spare_ecc = read_ecc;
>  	u_char *erased_ecc_vec;
> +	u_char *buf;
> +	int bitflip_count;
> +	int err;
> +	bool page_is_erased;
>  	enum bch_ecc type;
>  	bool is_error_reported = false;
>  
> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>  		}
>  
>  		if (eccflag == 1) {
> -			/*
> -			 * Set threshold to minimum of 4, half of ecc.strength/2
> -			 * to allow max bit flip in byte to 4
> -			 */
> -			unsigned int threshold = min_t(unsigned int, 4,
> -					info->nand.ecc.strength / 2);
> +			bitflip_count = 0;
> +			/* count zero-bits in OOB region */
> +			err = count_zero_bits(read_ecc, eccbytes,
> +						 eccstrength, &bitflip_count);
> +			if (err) {
> +				/*
> +				 * number of zero-bits in OOB > ecc-strength
> +				 * either un-correctable or programmed-page
> +				 */
> +				page_is_erased = false;
> +			} else if (bitflip_count == 0) {
> +				/* OOB == all(0xff) */
> +				page_is_erased = true;

This else-if is still incorrect. You cannot assume that just because the
OOB is all 0xff that the page is erased.

> +			} else {
> +				/*
> +				 * OOB has some bits as '0' but
> +				 * number of zero-bits in OOB < ecc-strength.
> +				 * It may be erased-page with bit-flips so,
> +				 * further checks are needed for confirmation
> +				 */
> +				/* count zero-bits in DATA region */
> +				buf = &data[eccsize * i];
> +				err = count_zero_bits(buf, eccsize,
> +						 eccstrength, &bitflip_count);
> +				if (err) {
> +					/*
> +					 * total number of zero-bits in OOB
> +					 * and DATA exceeds ecc-strength
> +					 */
> +					page_is_erased = false;
> +				} else {
> +					/* number of zero-bits < ecc-strength */
> +					page_is_erased = true;
> +				}
> +			}

This whole block (where you call count_zero_bits() twice) is a
convoluted and buggy way of just doing the following, AFAIU:

	page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
				ecc->strength, &bitflip_count) &&
			 !count_zero_bits(&data[ecc->size * i], ecc->size,
			 	ecc->strength, &bitflip_count);

You could modify that to your liking and add a comment, but it's much
more concise than your version, and from what I can tell, it's actually
correct.

>  
>  			/*
> -			 * Check data area is programmed by counting
> -			 * number of 0's at fixed offset in spare area.
> -			 * Checking count of 0's against threshold.
> -			 * In case programmed page expects at least threshold
> -			 * zeros in byte.
> -			 * If zeros are less than threshold for programmed page/
> -			 * zeros are more than threshold erased page, either
> -			 * case page reported as uncorrectable.
> +			 * erased-pages do not have ECC stored in OOB area,
> +			 * so they need to be handled separately.
>  			 */
> -			if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
> +			if (!page_is_erased) {
>  				/*
>  				 * Update elm error vector as
>  				 * data area is programmed

Brian



More information about the linux-mtd mailing list