RFC: detect and manage power cut on MLC NAND

Iwo Mergler Iwo.Mergler at netcommwireless.com
Sun Mar 22 21:08:58 PDT 2015


Hi all,


I probably don't know enough about the silicon implementation
of MLC paired pages, but my feeling is that there should be a
way to recover one of the pages if the paired write fails, at least
in some cases.

Something along the lines of using both bits to map to a single
good one.

2 bit MLC stores 4 levels - 1.0, 0.7, 0.3, 0.0. Obviously, the actual
voltage levels will be somewhat different, so take this as
electrons on the floating gate: 1.0=minimum, 0.0 maximum.

I imagine that there are two ways to achieve that - small step
for low page and large step for high page, or the other way 'round.

Assuming the first, the low page write would subtract 0.3 from
the erased (1.0) cell if the bit is 0. That leaves the cell at
either ~1.0 (1) or 0.7 (0).

Lvl    LH
===========
1.0 => 1u
0.7 => 0u

Then, the high page write would subtract either nothing (1) or
0.7 (0):

Lvl    LH
===========
1.0 => 11
0.7 => 01
0.3 => 10
0.0 => 00

So the MLC decoder logic gets 3 priority encoded bits from the
sense amplifiers: 111, 011, 001, 000. The decoder turns this
into 11, 01, 10, 00.

The process of writing a 0 to the high page, transitions low page
0-bits through 1 and back to 0, as the level moves down.

Low page 1 bits transition from 1 through 0 and back to 1.

So a half-completed high page 0-write can flip a low page bit
both ways.

We can detect an incorrect 0-1 transition in the low page,
because it's marked by a 0 bit in the high page.

We can't detect an incorrect 1-0 transition in the low page.

So assuming a failed high page write, this is what we get:

LH

11 = nothing happens, reads back as 11
     Correct level for both.

01 = Level stays at 0.7, reads back as 01,
     Correct level for low page.

10 = Level between 1.0 and 0.3, reads back as 11, 01 or 10.
     01 is wrong for low page, but can't be distinguished from 10.

00 = Level between 0.7 and 0.0, reads back as 01, 10, or 00.
     10 is wrong for low page, but can be distinguished from 01.

So, there are two bit combinations (50%) that have an
undetectable failure, and this failure will happen about half
the time, for a total of 25% unfixable failure rate.

Not acceptable in the general case, but might be good enough
for things like UBI EC & VID headers, if we ensure that the
high page contains 1s at the offsets at which the low page
stores the header.


Now, on the other hand, if the low page write uses the larger
step, there shouldn't be any paired page problem at all, since
the high page write wouldn't cross the low page thresholds
on the way:

Lvl    LH
===========
1.0 => 1u
0.3 => 0u

Lvl    LH
===========
1.0 => 11
0.7 => 10
0.3 => 01
0.0 => 00


Which makes me think I'm misunderstanding something. If not,
why isn't his scheme used in the first place?

What would happen if we reverse the paired page writing order?


Jeff, Qi, is the mechanism I described here anywhere near reality?


Best regards,

Iwo



More information about the linux-mtd mailing list