[RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling

Fri Oct 31 08:39:57 PDT 2014

Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
> Hi Tanya,
> 
> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>> Hi Richard
>>
>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>> Tanya,
>>>
>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>> I'll try to address all you comments in one place.
>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>
>>> But it does not really matter if the counters are a way to high or too low?
>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>> It won't hurt.
>>>
>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>>> - it is required on every boot.
>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>> So the minimal workflow for each boot life cycle will be:
>>>> - on boot: update kernel with correct values from userspace
>>>
>>> Correct.
>>>
>>>> - kernel updates the counters on each read operation
>>>
>>> Yeah, that's a plain simple in kernel counter..
>>>
>>>> - on powerdown: save the updated kernel counters back to userspace
>>>
>>> Correct. The counters can also be saved once a day by cron.
>>> If one or two save operations are missed it won't hurt either.
>>>
>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>>> the read-disturb problem since the dependency in userspace is just too big.
>>>
>>> Why?
>>> We both agree on the fact that the counters don't have to be exact.
>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>
>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
>> statistics, we can't ignore it and need to stay close to the calculated statistics.
>>
>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
> 
> So you have to update the EC-Header every time we read a PEB...?
> 
>>
>>>
>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>>> updated.
>>>
>>> Does it hurt if these counters are lost upon an upgrade?
>>> Why do we need them for ever?
>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>
>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
>> read will most probably fail.
> 
> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
> 
>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
>> you're lucky, ecc will fx them for you, but its not guarantied.
>>
>>>
>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>> Or use a sane upgrade process...
>>
>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>
>>>
>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>
>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>>> writes (erase operations), the  other handle issues caused by frequent reads.
>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>>> to the above mentioned reasons.
>>>
>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>
>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
> 
> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
> 
> Let's recap.
> 
> We need to address two issues:
> a) If a PEB is ready very often we need to scrub it.
> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
> 
> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
> I don't think that this is a good solution.
> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
> Btw: We also have to be very careful that reading data will not wear out the flash.
> 
> So, we need a logic within UBI which counts every read access and persists this data in some way.
> As suggested in an earlier mail this can also be done purely in userspace.
> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.

Another point:
What if we scrub every PEB once a week?
Why would that not work?

Thanks,
//richard