RFC: detect and manage power cut on MLC NAND

Wed Mar 25 18:57:25 PDT 2015

Jeff Lauruhn
NAND Application Engineer
Embedded Business Unit
Micron Technology, Inc

-----Original Message-----
From: Ricard Wanderlof [mailto:ricard.wanderlof at axis.com] 
Sent: Wednesday, March 25, 2015 1:33 AM
To: Iwo Mergler
Cc: Jeff Lauruhn (jlauruhn); Richard Weinberger; dedekind1 at gmail.com; Andrea Scian; Qi Wang 王起 (qiwang); mtd_mailinglist
Subject: RE: RFC: detect and manage power cut on MLC NAND

On Wed, 25 Mar 2015, Iwo Mergler wrote:

> > From a simplified point of view you're right.  In reality the 
> > program/erase recipes are actually quite advanced in order to get 
> > very tight distributions on a full page.  The lower/upper page 
> > sequence is designed to provide the most reliable results and 
> > optimally we would like the lower and upper page programmed 100% of
> > the time.   There's been a lot of work done over the years to improve
> > power loss and it's much better than in the past, but it's still 
> > something to be avoided on NAND.  It's always best to check the 
> > integrity of the page after a power loss event.
> 
> Is there any way to check the page integrity beyond ECC?
>
> I'm concerned that the power loss could yield an OK looking page, but 
> with not so tight charge distribution.
> 
> Maybe the hardware that can achieve tight distributions during 
> programming, can be accessed to measure distribution of a programmed 
> page?

What would be interesting from a software perspective would be if in some special mode one could read the read the memory cells and get an analog value with several bits of resolution, alowing the software to make an assessment as to how "good" the bits are. This would be in contrast to the normal, high speed, read mode. But perhaps matters are not that simple, either there is no such value to be had (but as I understand it in certain MLC flashes it is possible to shift the read thresholds, thus one could accomplish this by successive approximation. Sure, that means that one could do it entirely in software using existing devices, but it is a rather cumbersome process however), or there are other factors that govern the read thresholds which are not known outside the chip (or rather, outside the manufacturers lab!).

No special analog modes on production devices, but we are moving in the direction of giving more control to the end user.  As lithography goes down and bits per cell goes up we are adding we are trying to come up with manageable ways to recover data.  There's no analog read out on the roadmap, but new features like read retry, which generally assumes charge loss and allows the end user to try different read reference voltages and other read offset features are on the road map.  

Let me explain the read process a bit.  When we program and erase too, we set a target value L0 and L1 in the case of SLC and we get a distribution around that those values.  But when we read we apply a voltage to the gate of the cell we intend to read that is between L0 and L1, call it Vread, if the cell is erased, Vread is greater than the voltage threshold Vt of L0 and the cell will conduct and we will sense a current flow between the Drain and Source and the sense circuit registers a 1 for that cell.  If the cell is programmed, Vread will not be high enough to overcome the Vt of the cell and we will sense no current flow between the drain and source and the sense circuit registers a 0 for that cell.  The sense circuit is very simple because there needs to be 2K of them so we can sense the whole page simultaneously.  The type of circuitry required to measure an analog value would make the die huge.  If it was possible, it would already be designed in.  

Instead of measuring the cell voltage, it was easier to allow the end user to move Vread to maybe compensate for the shift in distribution.  

> > I have to be careful here because it's very dependent on the design 
> > and I really need to know the specifics to make a definitive 
> > statement, but a few ms should be enough time to protect the NAND.
> > WP# is your friend here.
> 
> The design is somewhat hypothetical - let's assume that we can 
> guarantee the NAND supply for 10ms after system reset asserts.
> 
> At reset time, the NAND controller will abort any command sequence in 
> progress, so the final "program page" command will be sent either 
> before the reset, or not at all. The command byte may be cut short on the bus.

It would seem to me that the only thing really needed to guarantee that writes (or erase operations) are not cut short by power loss, is as Iwo says that the system design is such that when power loss occurs, there is enough power to maintain valid supply voltage levels to allow the NAND to complete operations in the worst case, after system reset is asserted.

Admittedly we don't always have the luxury of well-designed hardware, but having clear design rules for the hardware guys would help a long way in future designs.

> I'm very happy to talk to someone at the coal face of modern NAND 
> manufacturing. :-)

Agreed, I think we're very many that appreciate Jeff's contributions on the list, me included. NAND data sheets are often not so forthcoming, and there ends up being a lot of speculation about how things actually work, so it's really nice to have someone with real knowledge to discuss this with.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30