[PATCH] nand gpmi fix erased block bitflip counting

Cappelle Wouter W.Cappelle at TELEVIC.com
Tue Nov 15 23:33:02 PST 2016


On 15-11-16 21:54, Marek Vasut wrote:
> On 11/09/2016 01:35 PM, w.cappelle at televic.com wrote:
>> From: Wouter Cappelle <w.cappelle at televic.com>
> Please add commit message explaining the purpose of the patch.
> CCing some more interested people.
Sorry, first patch, and don't know what went wrong or how to fix.

There should have been some introduction being added to the commit:

Some time ago, a patch was added to detect bitflips in erased pages 
(http://lists.infradead.org/pipermail/linux-mtd/2014-January/051467.html).
After running some test on my board (i.MX6UL), I detected some unexpected 
behavior with it, especially with the counting of the # of bitflips in the
erased chunks. I have the impressions that with some pattern, the gpmi block
did try to correct the data on an empty page. Therefore the gpmi block changed
the data leading to introducing extra bitflips and failing the criteria to 
decide if the (sub)page is erased.

I'm using BCH8 on a 2k nand page and created a testpage with 6 bitflips at following locations:
0x02D:FB
0x057:FE
0xA5:FB
0x16A:FB
0x18A:DF
0x4EE:FE

When reading the page through the driver, the page is uncorrectable (as 
expected), then it will verify if the page is erased (gpmi_erased_check).
There i can see that the first count of the first subpage, is returning me
it detected 7 bitflips (should be 5 in that subpage). The second count of 
bitflips on the full raw page returns me the correct amount of bitflips 
(being 6 for the complete page).

I Don't really see the need of the first subpage check, except of speed 
improvement. But as it is failing due to the gpmi block trying to repair the 
page and alternating the wrong bits, I would propose to either increase the
threshold of the first check with the max number of repairable bitflips the 
gpmi block is set to, or just skip the first check since on empty pages it will
however not make a difference in speed. For real uncorrectable pages, this will
not have a huge speed penalty due to the unlikely event that this will happen.

I propose following patch to be be applied to detect the correct number of 
bitflips based on the raw nand read data.

>
>> ---
>>  drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
>> index 8339d4f..6ae118c 100644
>> --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
>> +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
>> @@ -1217,6 +1217,7 @@ static bool gpmi_erased_check(struct gpmi_nand_data *this,
>>  	int base = geo->ecc_chunkn_size * chunk;
>>  	unsigned int flip_bits = 0, flip_bits_noecc = 0;
>>  	uint64_t *buf = (uint64_t *)this->data_buffer_dma;
>> +	unsigned char *chunkbuf =(unsigned char*) this->data_buffer_dma;
>>  	unsigned int threshold;
>>  	int i;
>>  
>> @@ -1224,13 +1225,6 @@ static bool gpmi_erased_check(struct gpmi_nand_data *this,
>>  	if (threshold > geo->ecc_strength)
>>  		threshold = geo->ecc_strength;
>>  
>> -	/* Count bitflips */
>> -	for (i = 0; i < geo->ecc_chunkn_size; i++) {
>> -		flip_bits += hweight8(~data[base + i]);
>> -		if (flip_bits > threshold)
>> -			return false;
>> -	}
>> -
>>  	/*
>>  	 * Read out the whole page with ECC disabled, and check it again,
>>  	 * This is more strict then just read out a chunk, and it makes
>> @@ -1246,6 +1240,12 @@ static bool gpmi_erased_check(struct gpmi_nand_data *this,
>>  			return false;
>>  	}
>>  
>> +	/* Count bitflips in the current chunk for correct stats reporting */
>> +	for (i = 0; i < geo->ecc_chunkn_size; i++) {
>> +		flip_bits += hweight8(~chunkbuf[base + i]);
>> +	}
>> +
>> +
>>  	/* Tell the upper layer the bitflips we corrected. */
>>  	mtd->ecc_stats.corrected += flip_bits;
>>  	*max_bitflips = max_t(unsigned int, *max_bitflips, flip_bits);
>>
>




More information about the linux-mtd mailing list