RFC: detect and manage power cut on MLC NAND

Sat Mar 14 03:03:07 PDT 2015

Hi Jeff,

Am 12.03.2015 um 23:57 schrieb Jeff Lauruhn (jlauruhn):
> 
> 
> Jeff Lauruhn
> NAND Application Engineer
> Embedded Business Unit
> Micron Technology, Inc
> 
> 
> -----Original Message-----
> From: Richard Weinberger [mailto:richard at nod.at] 
> Sent: Thursday, March 12, 2015 3:28 AM
> To: Jeff Lauruhn (jlauruhn)
> Cc: Andrea Scian; dedekind1 at gmail.com; Boris Brezillon; mtd_mailinglist
> Subject: Re: RFC: detect and manage power cut on MLC NAND
> 
> Am 11.03.2015 um 22:16 schrieb Jeff Lauruhn (jlauruhn):
>> Glad to help out.  I train FAE's and customers on many aspects of NAND including MLC.  
> 
> UBI (and UBIFS) was designed with SLC NAND in mind, so far we know that we have to address the following constraints when we want UBI on MLC NAND:
> 
> 
> 1. Avoid repeating bit patterns. This can be solved by scrambling. Boris did some great work in this area.
> 2. Paired pages. We'll have choose pages we write to very carefully to not loss already written data in case of a power cut.
> 	For MLC we store 4 bits in the same cell has 
> 3. Read disturb. Happens also on SLC but not that early. I'm working in this.
> 		
> 4. Data retention. i.e, blocks that have not been erased for a long time have to be re-erased. I'm working in this too.
> 	
> 5. Unstable bits (not MLC specific).
> 	Two types.  Data retention and Disturbs (read and program).   Data retention (charge loss) tends to shift left,  
> 6. What did I miss?
> 
> Jeff, what do you think?

can you please say something on the "TODO" list? Did we miss something?
Do you have kind of a design document?

> Can you point us to some hard facts? I'm specially interested in numbers on read disturb and data retention.
> I wish there were numbers, it would make my job a lot easier, but NAND doesn't work that way.  Data retention is dependent on process node (35nm, 25nm, 20nm 15nm for example) P/E cycles and temperature. We generally specify our NAND using JEDC standards, x numbers of years at 55C with 10% cycling.   If you apply our recommended ECC, then you will be able to store data and recover it after x numbers of years.  But temperature, P/E, process size and usage have major effects on data retention so we recommend actively managing your NAND.  This is what you do and what I find so interesting about your group.

But I'm sure there are some rough numbers. Do we have to expect read disturb after say 100 reads?
https://www.micron.com/~/media/Documents/Products/Presentation/flash_mem_summit_jcooke_inconvenient_truths_nand.pdf
Are the numbers on page 20 still valid?

> I can speak in generalities for now, and when I get more specifics I can predict and recommend solutions.
> 
> Data Retention is characterized by a loss of charge.  We program a bit from 1 to 0 (just the opposite of what people think).  Over time the charge will leave the gate, this is normal NAND behavior.  I say that the distribution of charged cell shifts left toward uncharged. Why is SLC better than MLC?  First SLC was first and used older larger lithographies.  You could store 10's of thousands of electrons on a 40nm gate and you only had two states L0 erased (0volts) or L1 programmed (1.5v for example). If you lose a few, it didn't make much difference and there was a lot of room between 0 and 1.5 volts.  Newer processes are 20nm and MLC.  With a smaller gate there are just a few hundred electrons and they need to be disturbed in one of 4 levels L0 (0 volts), L1(.5 volts), L2(1volts) or L3 (1.5 volts).  Now adding or losing a few electrons can have a larger effect.  We determine a programmed cell by measuring between the L0 and L1.  If the levels have shifted, they can shift to the poi
nt where you can't reliably tell the difference between L0 and L1.  When there are more levels and they are closer together it makes it just that much more challenging.  
> 
> 
> Program/Read Disturb are characterized by charge gain or a shift right.  The problem is the same.
> 
> These affects are not instantaneous, they happen over long periods of time.  Instead of trying to predict every case, we recommend actively managing NAND, and that's what your team does.  Read the data an use ECC to see how much the data has changed.  Create a threshold and when the data hits the threshold move the data to a fresh block.  

Thanks,
//richard