[PATCH] Newly erased page read workaround

Fri Apr 1 05:06:51 EDT 2011

On 4/1/2011 2:09 PM, Artem Bityutskiy wrote:
> On Fri, 2011-04-01 at 14:03 +0530, Vipin Kumar wrote:
>>>>> Also, Ivan pointed you the right thing - you might have bit-flips
>> on an
>>>>> erased eraseblock. If not on freshly, then on an erasblock which
>> was
>>>>> erased and then not used for long time. If this is not of your
>> concern,
>>>>
>>>> In that case an ecc error would be reported since the ecc wont
>> match the stored 
>>>> ecc i.e FFFF and the driver would mark it as a normal corrupted
>> page
>>>
>>> I'm confused. So you erased eraseblock A. Everything there contains
>>> 0xFFs, including the OOB area.
>>>
>>> Now you have one of the modern lashes. You gen bit-flips in the
>> page.
>>> Say, a couple of bits flip. You read this page. You compare the
>> contents
>>> of the page with 0xFF and find out that the contends in not all
>> 0xFFs.
>>> What do you do then?
>>>
>>
>> Then, the normal driver takes over and it reports an error because
>> the 
>> number of errors in the page are beyond 8 bits (maximum the FSMC ecc 
>> logic can correct).
> 
> Why 8? It may be just 1 single bit-flip. Just one bits becomes 0 instead
> of 1.
> 

It is a maximum of 8. So, the logic can correct any number of bitflips from 
1 to 8 in 512 bytes of data

>>  Effectively speaking, the read page returns an error 
>> indicating that the page could not be read properly
> 
> But why? It can be read properly. If this is just 1 wrong bit, you
> should be able to correct it. And as Ivan indicated, modern flashes are
> so crappy that 1 bit-flip on erased eraseblock is just normal there.
> 

That's the problem. Ideally the ecc should have been programmed in OOB and then 
the driver would be able to correct the flipped bits. The problem happens only 
if we try to read the erased pages.

>> Ideally, any filesystem would mark it as a bad block 
> 
> That's the point - no. This is normal on modern flashes.
> 
> I think one solution could be that you make your check more
> sophisticated. You check for 0xFFs, if this is not true, you see is this
> "almost all 0xFFs" and count amount of non-0xFF bits. If the count is,
> say, 2, you assume this page contains all 0xFFs plus 2 bit-flips. But
> I'm not sure it would work.
> 
> Anyway, If you do not care about such bit-flips for your SoC - fine. I
> just wanted you to understand and accept the issue and write about it in
> the comment. And I also wanted you to _not_ do expensive 0xFF comparison
> every time - but it seems you accepted this :-)
> 

Yes, I had to accept this :-)
The flip side is that the hardware itself should not report errors when it 
reads all ff data and ff ecc..It should assume it as an erased page and not 
report any errors

Yes, I understand the issue and would write more details about it in the 
patch comment. Moreover, the comparison id only expensive when the page 
contains all FFs.. Most of the time comparison would fail at first few bytes

Regards
Vipin