RFC: detect and manage power cut on MLC NAND

Jeff Lauruhn (jlauruhn) jlauruhn at micron.com
Mon Mar 23 14:15:25 PDT 2015


This is a very simplified description, but actually it's more like this: 

First pass, program the lower page.  If you the lower page is 1, do nothing.  If the lower page is 0 subtract 0.7v to 0.3.  Lower page is SLC like, two distributions spread apart by 0.7V.

Lvl    LH
===========
1.0 => 1u
0.3 => 0u

Now, program the upper page.  First, read lower page, if lower page is 1 and upper page is 1, do nothing (11).  If lower page is 1 and upper page is 0, then subtract -0.3 and call that 01.  Next if lower page is 0 and upper page is 1 do nothing and if lower page is 0 and upper page is 0 subtract 0.3v and call it 00.  Notice that state of lower page is on right of 11, 01,10, 00.  

Lvl    LH
===========
1.0 => 11
0.7 => 01
0.3 => 10
0.0 => 00

Now what happens if there's a power loss during the programming of the upper page?  The upper page data will most likely be lost, and the lower page may be changed, but there's a good chance of recovery, because it will be in the range of SLC.  It is highly recommended to read and refresh data after a power loss.
 

Jeff Lauruhn
NAND Application Engineer
Embedded Business Unit

-----Original Message-----
From: Iwo Mergler [mailto:Iwo.Mergler at netcommwireless.com] 
Sent: Sunday, March 22, 2015 9:09 PM
To: Richard Weinberger; dedekind1 at gmail.com
Cc: Andrea Scian; mtd_mailinglist; Jeff Lauruhn (jlauruhn); Qi Wang 王起 (qiwang)
Subject: RE: RFC: detect and manage power cut on MLC NAND


Hi all,


I probably don't know enough about the silicon implementation of MLC paired pages, but my feeling is that there should be a way to recover one of the pages if the paired write fails, at least in some cases.

Something along the lines of using both bits to map to a single good one.

2 bit MLC stores 4 levels - 1.0, 0.7, 0.3, 0.0. Obviously, the actual voltage levels will be somewhat different, so take this as electrons on the floating gate: 1.0=minimum, 0.0 maximum.

I imagine that there are two ways to achieve that - small step for low page and large step for high page, or the other way 'round.

Assuming the first, the low page write would subtract 0.3 from the erased (1.0) cell if the bit is 0. That leaves the cell at either ~1.0 (1) or 0.7 (0).

Lvl    LH
===========
1.0 => 1u
0.7 => 0u

Then, the high page write would subtract either nothing (1) or
0.7 (0):

Lvl    LH
===========
1.0 => 11
0.7 => 01
0.3 => 10
0.0 => 00

So the MLC decoder logic gets 3 priority encoded bits from the sense amplifiers: 111, 011, 001, 000. The decoder turns this into 11, 01, 10, 00.

The process of writing a 0 to the high page, transitions low page 0-bits through 1 and back to 0, as the level moves down.

Low page 1 bits transition from 1 through 0 and back to 1.

So a half-completed high page 0-write can flip a low page bit both ways.

We can detect an incorrect 0-1 transition in the low page, because it's marked by a 0 bit in the high page.

We can't detect an incorrect 1-0 transition in the low page.

So assuming a failed high page write, this is what we get:

LH

11 = nothing happens, reads back as 11
     Correct level for both.

01 = Level stays at 0.7, reads back as 01,
     Correct level for low page.

10 = Level between 1.0 and 0.3, reads back as 11, 01 or 10.
     01 is wrong for low page, but can't be distinguished from 10.

00 = Level between 0.7 and 0.0, reads back as 01, 10, or 00.
     10 is wrong for low page, but can be distinguished from 01.

So, there are two bit combinations (50%) that have an undetectable failure, and this failure will happen about half the time, for a total of 25% unfixable failure rate.

Not acceptable in the general case, but might be good enough for things like UBI EC & VID headers, if we ensure that the high page contains 1s at the offsets at which the low page stores the header.


Now, on the other hand, if the low page write uses the larger step, there shouldn't be any paired page problem at all, since the high page write wouldn't cross the low page thresholds on the way:

Lvl    LH
===========
1.0 => 1u
0.3 => 0u

Lvl    LH
===========
1.0 => 11
0.7 => 10
0.3 => 01
0.0 => 00


Which makes me think I'm misunderstanding something. If not, why isn't his scheme used in the first place?

What would happen if we reverse the paired page writing order?
Not recommended, we want pages programmed in sequence to mitigate disturbs and obtain the highest reliability.


Jeff, Qi, is the mechanism I described here anywhere near reality?
It's a simplified view, but fairly accurate.  


Best regards,

Iwo



More information about the linux-mtd mailing list