[PATCH v2] mtd: nand: Add driver for M-sys / Sandisk diskonchip G4
Mike Dunn
mikedunn at newsguy.com
Thu Nov 3 12:54:56 EDT 2011
On 11/03/2011 01:58 AM, Ivan Djelic wrote:
> Hi Mike,
> A few comments below:
>
>> +
>> +/* value generated by the HW ecc generator upon reading blank page */
>> +static uint8_t blank_read_hwecc[7] = {
>> + 0xcf, 0x72, 0xfc, 0x1b, 0xa9, 0xc7, 0xb9 };
> Using this blank ecc value to detect blank pages after reading them is not that
> reliable, because if a bitflip appears in an erased page, the HW ecc generator
> will generate a completely different sequence of bytes. Your driver will think
> the page is not blank, it will try to correct errors and fail. And UBIFS will
> not appreciate that.
>
> I can see two cleaner alternatives to solve this issue:
>
> 1. When you program a page, before writing hwecc to oob, adjust it like this:
>
> hwecc[i] ^= blank_read_hwecc[i]^0xff;
>
> The effect of this is that you now get 0xffs as ecc for blank pages, and bitflip
> correction on erased pages for free. This is transparent to your controller,
> because this "adjustment" cancels itself upon reading when calc_ecc^recv_ecc is
> computed.
This is neat! Will try this today.
> 2. Use unprotected spare oob byte 15 as a programmming marker: remove it from
> the oob_free list, and force it to 0 when you program a page. Now, you can
> easily detect if a page is blank or has been programmed by checking this byte.
> You can for instance count the number of bits set to 1 in the byte, and decide
> it is blank if that number is greater than 4; this ensures you are robust to
> bitflips in the marker byte itself.
>
> My preference would go to option 2 in your case.
Hmm, another good idea. I haven't given much thought to robustness until now.
Heck, I never expected to get this driver working right at all. Once again, I'm
in your debt, Ivan.
>> +
>> + /* fix the errors */
>> + fixederrs = 0;
>> + for (i = 0; i < numerrs; i++) {
>> + if (errpos[i] > DOCG4_USERDATA_LEN * 8)
>> + continue; /* error in ecc oob bytes; ignore */
>> + change_bit(errpos[i], (unsigned long *)buf);
>> + fixederrs++;
>> + }
> Here, your check on errorpos[i] to skip ecc oob locations is correct; but a
> bitflip (in ecc bytes or elsewhere) is an indication that the nand page needs
> scrubbing (you don't want bitflips to accumulate), so you should increase
> fixederrs in that case also.
Thanks. Will make this change.
> If you see a _lot_ of scrubbing operations from UBI, you may also consider
> concealing bitflip corrections; i.e. report them only if the number of corrected
> bits is above a certain threshold (e.g. >= 3 in your case).
Something to keep in mind when I start stress testing with ubifs. I have
noticed some scrubbing in the testing so far. At one point, it marked a block
as bad, so this may be an issue. The error rate seems erratic. Sometimes there
are many, and even the occasional uncorrectable error. Lately, there have just
a few one-bit errors when I run nandtest over the whole device. I noticed this
inconsistency while observing operation under the PalmOS code as well, so it's
not a problem in my driver. Maybe due to solar activity :-)
Thanks again,
Mike
More information about the linux-mtd
mailing list