[RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling

Fri Nov 7 01:21:34 PST 2014

On Sun, 2014-11-02 at 15:30 +0200, Tanya Brokhman wrote:
> > If NAND why not use ECC to monitor for disturb?
> 
> We don't want just to monitor, we want to prevent cases where ecc cant 
> be fixed. You said it yourself later on "BCH ECC will tell you if bits 
> have changed and will correct up to 5". The goal is to prevent more then 
> 5 errors that can't be fixed.
> 
> NAND is a great storage unit, but you have to follow the rules.  Please 
> refer to Micron datasheet MT29F2G08ABAEAH4 page 100.  NAND is made up of 
> blocks(2048 in this case), each block has a number of pages.  The block 
> is the smallest erasable unit and the only way to change 0's to 1's. 
> Pages are the smallest programmable unit and can only change 1's to 0's. 
>   P/E cycling (100,000 in this case) wears out the block.  We provide 
> 64bytes of spare area for BCH ECC and NAND management.  BCH ECC will 
> tell you if bits have changed and will correct up to 5.
> >
> > Read disturb is a recoverable failure.  It doesn't affect the cells in the page you are reading it affects the cells on either side of the page you are reading.  P/E cycling for this device is 100,000.  You can program once and read many many times.
> >
> > Data retention is the loss of charge on the cells.  Technically you can only change a 0 to 1 by erasing the whole block.  However, data retention is the loss of charge in a cell over time. In this case data retention is 10 years.
> > Data retention gets worse as temperature goes up.
> 
> Exactly! We're aware of all you described above. This is exactly why we 
> need to handle both read disturb and data retention.

Hi Tanya,

just a friendly notice: did you notice that you drop all the CCs in the
reply? Even the person you replied to was not in "To". I guess it is
worth checking your e-mail client's settings.

Jeff, my main concern about the patches is whether they really address
NAND problems, and whether the complexity they introduce are worth it.
The counter-approach is to just read the entire flash periodically, and
just scrub the PEBs (physical eraseblocks) which have have enough
bit-flips (more than a configured threshold per ECC unit, say 1 or 2).

I tried to explain my concerns in here:
http://lists.infradead.org/pipermail/linux-mtd/2014-November/056385.html
http://lists.infradead.org/pipermail/linux-mtd/2014-November/056386.html

Thanks!