[RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling

Tanya Brokhman tlinder at codeaurora.org
Mon Oct 27 01:41:15 PDT 2014


On 10/26/2014 10:39 PM, Richard Weinberger wrote:
> Am 26.10.2014 um 14:49 schrieb Tanya Brokhman:
>> One of the limitations of the NAND devices is the method used to read
>> NAND flash memory may cause bit-flips on the surrounding cells and result
>> in uncorrectable ECC errors. This is known as the read disturb or data
>> retention.
>>
>> Today’s Linux NAND drivers implementation doesn’t address the read disturb
>> and the data retention limitations of the NAND devices. To date these
>> issues could be overlooked since the possibility of their occurrence in
>> today’s NAND devices is very low.
>>
>> With the evolution of NAND devices and the requirement for a “long life”
>> NAND flash, read disturb and data retention can no longer be ignored
>> otherwise there will be data loss over time.
>>
>> The following patch set implements handling of Read-disturb and Data
>> retention by the UBI layer.
>
> So, your patch addresses the following issue:
> We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues).
> Is this correct?

Not exactly... We need to scrub a PEB that is being frequently read from 
in order to prevent bit-flip errors that might occur due to read-disturb

>
> Currently users of UBI do this by having cron jobs which read the complete UBI volume
> and then cause scrub work.
> The draw back of this is that only UBI payload will be read and not all data like EC and VID headers.
> I understand that you want to fix this issue.

Not sure I completely understand what this crons do but the last patch 
in the series does something similar.

>
> According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout.
> Both the read counters and timestamps don't have to be exact values.

Why not? Storing last_erase_timestamp doesn't increase the memory 
consumption on NAND since I used reserved bytes in the ec_header. I 
agree that the RAM is increased but I couldn't find any other way to 
have these statistics saved.
read_counters can be saved ONLY as part of fastmap unfortunately because 
of the erase-before-write limitation.

>
> What about this idea?
> Add a userspace interface which allows UBI to expose read counters and last access timestamps.

Where will you save those?

> A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB.

Not a re-read - scrub. read-disturb is fixed by erasing the PEB.

> This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a
> power cut it won't hurt.

Not sure i follow. How is this better then doing this from the kernel? 
you do have to store the timestamps and the read_counters somewhere and 
they are both updated in the ubi layer. I must be missing something 
here. Could you please elaborate on your idea?

> We could also add another internal UBI volume which can carry these data.

I'm afraid I have to disagree with this idea. First of all having a 
dedicated volume for this data is an overkill. Its not a sufficient 
amount of data to reserve a volume for. and what about the PEBs that 
belong to this volume? Taking this feature out of the UBI layer is just 
complicated, feels wrong from design perspective, and I don't see the 
benefit of it. Basically, its very similar to the wear-leveling but for 
"reads" instead of "writes".

>
> All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO.

Why? Without addressing this issues we can't have devices with life span 
of more then ~5 years (and we need to). And this is very similar to 
wear-leveling and erase counters. So why is read-counters and 
erase_timestamp is an overkill?
I'm working on your idea of changing the fastmap layout to save all the 
read disturb data at the end of it and not integrated into fastmap 
existing data structures (as is done in this version of the code). But 
as I see it, fastmap has to be updates as well.

>
> Thanks,
> //richard
>


Thanks,
Tanya Brokhman
-- 
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum, a Linux Foundation Collaborative Project



More information about the linux-mtd mailing list