[PATCH 12/13] mtd/docg3: add ECC correction code

Sat Oct 29 12:37:35 EDT 2011

Ivan Djelic <ivan.djelic at parrot.com> writes:

> On Fri, Oct 28, 2011 at 06:51:31PM +0100, Robert Jarzmik wrote:
>>  
>> +/**
>> + * struct docg3_bch - BCH engine
>> + */
>> +static struct bch_control *docg3_bch;
>
> Why not putting this into your struct docg3, instead of adding a global var ?
Because I have multiple floors (ie. 4 floors for example), which are split into
4 different devices. If I put that in docg3 structures (ie. the 4 allocated
structures, each for one floor), I'd either have to :
 - allocate 4 different bch "engines"
 - or count docg3 releases and release the bch at the last kfree(docg3), which
 makes me have another global variable.

>
>> +static int doc_ecc_bch_fix_data(struct docg3 *docg3, void *buf, u8 *calc_ecc)
>> +{
>> +	u8 ecc[DOC_ECC_BCH_T + 1];
>
> Should be 'u8 ecc[DOC_ECC_BCH_SIZE];'
OK, I'll amend it.

>
>> +	int errorpos[DOC_ECC_BCH_T + 1], i, numerrs;
>
> Using 'errorpos[DOC_ECC_BCH_T]' is fine, no need for '+ 1'.
OK, got it.

>> +
>> +	for (i = 0; i < DOC_ECC_BCH_SIZE; i++)
>> +		ecc[i] = bitrev8(calc_ecc[i]);
>> +	numerrs = decode_bch(docg3_bch, NULL, DOC_ECC_BCH_COVERED_BYTES,
>> +			     NULL, ecc, NULL, errorpos);
>> +	if (numerrs < 0)
>> +		return numerrs;
>
> Maybe do something like 'printk(KERN_ERR "ecc unrecoverable error\n");' when
> numerrs < 0 ?
That will be too verbose. As there are special partitions where the ECC is not
used that way (ie. SAFTL partitions I don't understand well yet), the ECC is
always wrong there.
Moreover, the error is reported as EBADMSG later (in doc_read), making the
information available to userland.

>
> (...)
>
>> +	for (i = 0; i < numerrs; i++)
>> +		change_bit(errorpos[i], buf);
>
> 'buf' holds the buffer of read data (512 + 7 + 1 bytes); hence you should
> change the above code into something like:
>
> 	for (i = 0; i < numerrs; i++)
> 		if (errorpos[i] < DOC_ECC_BCH_COVERED_BYTES*8)
> 			/* error is located in data, correct it */
> 			change_bit(errorpos[i], buf);
> 	        /* else error in ecc bytes, no data change needed */
>
> otherwise 'change_bit' will be out of bounds when a bitflip occurs in ecc
> bytes. Note that we still need to report bitflips in that case, to let upper
> layers scrub them.
Ah OK, I wasn't aware of that. I'll amend the code, thanks.

>
> (...)
>> +	docg3_bch = init_bch(DOC_ECC_BCH_M, DOC_ECC_BCH_T,
>> +			     DOC_ECC_BCH_PRIMPOLY);
>> +	if (!docg3_bch)
>> +		goto nomem2;
>
> Just a side note: if you need to get maximum performance from the BCH library,
> you can set fixed values for M and T in your Kconfig, with something like:
>
>  config MTD_DOCG3
>         tristate "M-Systems Disk-On-Chip G3"
> 	select BCH
>         ---help---
>           This provides an MTD device driver for the M-Systems DiskOnChip
>           G3 devices.
>
> if MTD_DOCG3
> config BCH_CONST_M
> 	default 14
> config BCH_CONST_T
> 	default 4
> endif
>
>
> The only drawback is that it won't work if you select your DOCG3 driver and, at
> the same time, other MTD drivers that also use fixed, but different
> parameters.
Oh, that seems nice. As I'm working with a smartphone, there is only one mtd
inside, so the speed-up will be nice.

>
> (...)
>> @@ -1660,6 +1717,7 @@ static int __exit docg3_release(struct platform_device *pdev)
>>  			doc_release_device(docg3_floors[floor]);
>>  
>>  	kfree(docg3_floors);
>> +	kfree(docg3_bch);
>
> This should be 'free_bch(docg3_bch)'.
Right.

>
> Otherwise it looks OK to me; did you test BCH correction by simulating
> corruptions (of 1-4 bits, and also 5 bits to trigger failures) in nand data ?
I did test the algorithm, yes.
To be more precise, I tested your last BCH encoding algorithm (emulate_docg4_hw)
which gives exactly the same ECC.

I then flipped 2 bits (buf[0] ^= 0x01 and buf[1] ^= 0x02), and got correct
errorpos (ie. 0 and 10 IIRC).

The thing I didn't check is the decode_bch() call with the hardware calculated
ECC. I tried on my PC:
  decode_bch(bch, data, 520, ecc, NULL, NULL, errorpos);
  => this *does* work
while in the kernel I did:
  decode_bch(docg3_bch, NULL, DOC_ECC_BCH_COVERED_BYTES,
			     NULL, ecc, NULL, errorpos);
  => this might work...

What I'm a bit afraid of is my poor understanding of the hardware ECC engine. I
know that the write part is correct (ie. ECC calculation), but I'm a bit
confused by the read part.

What wories me is that the hardware ECC got back while reading (ie. what I
called calc_ecc) is always 00:00:00:00:00:00:00 when I read data (because I
don't have bitflips on my flash). This looks to me more a "syndrom" than a
"calc_ecc".

To be sure, I could write a page of 512 bytes + 16 bytes, where the BCH would be
forced (and incorrect), to check what the hardware generator gives me back. I'd
like you to help me, ie:
 - tell me what to write to the first 512 bytes (only 0, all 0 but one byte to
 1, other ...)
 - I think I'll write 8 bytes to 0x01 for the first 8 OOB bytes (Hamming false
 but I won't care)
 - tell me what to write to the 7 BCH ECC

That way, I could provide you the result and you could tell me if
doc_ecc_bch_fix_data() is correct or not. Would you agree to spend some time on
that ?

Cheers.

--
Robert