[PATCH v6 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
Gupta, Pekon
pekon at ti.com
Wed Jan 15 18:17:13 EST 2014
HI Brian,
>From: Brian Norris [mailto:computersforpeace at gmail.com]
[...]
>>
>> This patch solves following problems in ECC correction for BCHx_HW ecc-schemes.
>> Problem: Current implementation depends on a specific byte-position (reserved
>> as 0x00) in ecc-layout to differentiate between programmed-pages v/s
>> erased-pages.
>> 1) All ecc-scheme layouts do not have such Reserved byte marker to
>> differentiate between erased-page v/s programmed-page. Thus this is a
>> customized solution.
>> 2) Reserved byte can itself be subjected to bit-flips causing erased-page
>> to be misunderstood as programmed-page.
>>
>> Solution: This patch removes dependency on single byte-position ini ecc-layout
>> to differentiating between erased-page v/s programeed-page.
>> This patch 'assumes' any page to be 'erased':
>> (a) if all(read_ecc) == 0xff
>> (b) else if all(read_data) == 0xff
>
>Your assumptions break down if somebody intentionally programmed a page
>with 0xff data. Then the ECC region will not be 0xff, but the data area
>may be all 0xff. Isn't this a major problem? (You can try testing this
>with nandwrite/nanddump using a page of 0xff.)
>
I thought of this much, but I found that UBI doesn't really differentiate between
(a) page intentionally programmed with 0xff data.
(b) page left blank.
This is why we have 'white-space-fixup' feature in UBIFS, which re-erased the
page which it thinks are blank. For UBI, any blocks which has 'vid-header'
is assumed to contain some user-data.
Example: When UBI image is flashed using 'nand write' command from u-boot.
u-boot writes 0xff as data along with its OOB into empty pages.
But when UBIFS scans all the blocks it considers the blocks without 'vid-header'
as erased, in spite the fact that the pages is programmed with 0xff data.
(However, Artem can confirm this more)..
>>
>> Reasons for (a)
>> - An abrupt termination of page programming (like power failure)
>> may result in partial write, leaving page in corrupted state with
>> un-stable bits. As OOB region is programmed after the data-region,
>> so if read_ecc[] == 0xff, then a page should treadted as erased.
>>
>> - Also, as ECC is not present, any bitflips in page cannot be detected.
>>
>> Reasons for (b)
>> - Due to architecture of NAND cell, bit-flips cannot change programmed
>> value from '0' -> '1'. So if all(read_data) == 0xff then its confirmed
>> that there is 'no bit-flips in data-region'. Hence, read_data[] == 0xff
>> can be safely returned.
>>
I think this is a valid assumption, because converting all the programmed bits
back to '1' (s) requires erase operation, which internally is done at slightly
higher voltage (as per my understanding of flash cell & floating-gate architecture).
>> - if page was programmed-page with 0xff and 'calc_ecc[] != 0x00' then
>> it means that page has bit-flips in OOB-region.
>>
Sorry, yes may be I wrote this incorrectly .. (please ignore this)
>> - if page was erased-page and 'read_ecc[] != ecc_of_all_0xff' then
>> it mean that there are bit-flips in OOB-region of page.
Sorry there is a typo here.. it should be calc_ecc[] != ecc_of_all_0xff.
If (OOB == all(0xff) /* page is erased */
If (calc_ecc[] != ecc_of_all_0xff) /* erased-page has bit-flips */
So, I think above assumption is also correct ??
(And this has been already followed in existing OMAP driver, refer bch8_vector)
>
>I think several assumptions and statements you are making are flawed,
>and (unless I'm wrong) you probably need to rethink the approach in this
>patch and the previous one.
>
>Are you mainly trying to (1) improve performance and (2) remove reliance
>on a single byte marker? I think the first point may be misguided and
>not important in this case, but the second looks like you can solve it
>well. What do you think of trying to tackle (2) without (1)? You would
>be keeping much of the existing hweight()-related code for determining
>bitflips across the whole data+OOB, but you could still have a smaller
>ECC-only check to perform a faster first pass.
>
Yes, I want to solve (2) "remove reliance on a single byte marker".
That's on priority.
But recently I have come across some MLC and new technology NAND
devices, which have very high bit-flip rates.
(a) Almost every page has bit-flips. And some pages have stuck bits too.
But all this is within correctable range so performance will certainly hit.
(b) Even the erased-pages are having lot of bit-flips in OOB area, so
it's difficult to differentiate between erased-page v/s programmed-page.
(c) And due to the bit-flips in erased-pages the 'faster' ECC-only check will
mostly fail, and so we fall back to comparing each byte with 0xFF. Thus
performance falls further.
I'm unable to find a proper solution for avoiding (2) and still able to meet (b).
So, in this patch I just did two checks
- compare all OOB with 0xff to see if the page is programmed.
- compare all data with 0xff to see if data is present.
If either of the two returns as true, then I consider the page as erased.
because UBIFS does not distinguish between erased-page and
programmed-page having all(0xff) data.
[...]
>> + /* (a) page can be 'assumed' erased if
>> + * all(read_ecc) == 0xff */
>
>Can you fix your comment style please? Note the multiline style used by
>the comments you are deleting.
>
Ok.. But these are wasting 2 lines :-)
with regards, pekon
More information about the linux-mtd
mailing list