RFC: detect and manage power cut on MLC NAND

Mon Mar 16 09:02:27 PDT 2015

"What if it is set low while NAND is ready - that is it can accept new commands - but an erase operation is in progress?".  Again, the likelihood is low, in this case a block that was intended to be erased my not be fully erase, it can be fully erased when power returns.  No data will lost because the block was intended to be erased anyway. 

"Does the erase operation complete anyway?"  There has been a lot of work done to mitigate power loss on NAND, but I haven't ever seen a design from any NAND vendor that is 100%.  Surprise power loss should be avoided on NAND or consider power detection and elegant shutdown circuitry.

Jeff Lauruhn
NAND Application Engineer
Embedded Business Unit

-----Original Message-----
From: Andrea Marson - DAVE Embedded Systems [mailto:andrea.marson at dave.eu] 
Sent: Saturday, March 14, 2015 2:46 AM
To: Jeff Lauruhn (jlauruhn)
Cc: Boris Brezillon; Andrea Scian; Richard Weinberger; mtd_mailinglist; dedekind1 at gmail.com
Subject: Re: RFC: detect and manage power cut on MLC NAND

Hi Jeff,

thank you for your availability.
I would like to go back to your statement about Write Protect pin:

 > Power loss is actually very complex.  The Write Protect (WP) pin was  > added to NAND help lock the NAND when a power loss event is  > detected.  I have extensive information on NAND and would be happy to  > discuss.

IIUC WP must respect several constraints. For example it must not be transitioned while NAND is busy. What if it is set low while NAND is ready - that is it can accept new commands - but an erase operation is in progress? Does the erase operation complete anyway?

Best regards,
Andrea MARSON
DAVE Embedded Systems
www.dave.eu

>
>
>
> Jeff Lauruhn
>
> -----Original Message-----
> From: linux-mtd [mailto:linux-mtd-bounces at lists.infradead.org] On 
> Behalf Of Boris Brezillon
> Sent: Friday, March 13, 2015 1:32 PM
> To: Jeff Lauruhn (jlauruhn)
> Cc: Richard Weinberger; dedekind1 at gmail.com; mtd_mailinglist; Andrea 
> Scian
> Subject: Re: RFC: detect and manage power cut on MLC NAND
>
> Hello Jeff,
>
> I'm joining the discussion to ask more questions about MLC NANDs ;-).
>
> Could you tell us more about how block wear impact the voltage level stored in NAND cells.
>
> 1/ Are all pages in a block impacted the same way ?
> 	Yes, because of block erase, P/E cycles affect all the pages in a block.
> 2/ Is wear more likely to induce voltage increase, voltage decrease
>     or is it unpredictable ?   Wear is a very well known a NAND characteristic.   During P/E cycling there is a potential for electrons to get permanently trapped in the oxide.  The more P/E cycles the more electrons get trapped.  Over many P/E cycles cells well get to a point where they look permanent programmed and can't be erased or programmed.  As cells begin to fail, ECC can be used to recover the data.  If too many bits fail in page the device will respond with a FAIL status after a P/E cycle.
> 	
> 3/ Is it possible to have more than one working voltage threshold
>     (read-retry mode): I did some testing on my Hynix chip (I know you
>     work for Micron but that's the only MLC chip I have :-)), and I
>     managed to get less bitflips by trying another read-retry mode even
>     if the previous one was allowing me to successfully fix existing
>     bitflips.
> Read Retry is available on some newer  products.  RR was introduced to help maintain and improve data retention and P/E cycles as geometry shrinks and bit/cell increase.  If the device supports RR, we have predefined RR Options, based on the most  likely chance of success.  Start with option 1 and step through the options until you get a successful read.  The DS usually has pretty good information.
>
> 4/ Do you have any numbers/statistics that could
>     help us choose the more appropriate read-retry mode according to the
>     number of P/E cycles ?  I don't have numbers or statistics, but I can tell you that the RR steps are generally defined based on known NAND behavior.  Go to the Micron website and put in this PN MT29F128G08CBCCB and you will find good information on RR.
>
> 5/ Any other things you'd like to share regarding read-retry ?
> RR isn't available on all devices.   From your prospective I would give them the option to use RR if it's available.
>
> Apart from that, we're currently trying to find the most appropriate way to deal with paired pages, and this sounds rather complicated.
> The current idea is to expose paired pages information up to the UBIFS layer, and let UBIFS decide when it should stop writing on pages paired with already written pages.
> Moreover, we have a few pages we need to protect (UBI metadata: EC and VID headers) in order to keep UBI/UBIFS consistent.
> Do you have anything to share on this topic (ideas, solutions we 
> should consider, constraints we're not aware of, ...)
>
> This is one of the reasons I came to this site.  I have a great deal of device knowledge and I need to know more about how end users use the device.
>
> Most designs today employ power loss detection and employ elegant shutdown to the NAND.  In addition, we provide Write Protect, which provides an extra layer of protection against power loss.  There is still a chance that if the power event happens during a program to a page, the previously programmed shared page can also be corrupted.  It's not clear to me how to keep track of shared pages for every device out there.  It's not like a parameter page that you can read.  It's an interesting problem.
>
> Thanks for your valuable information.
>
> Best Regards,
>
> Boris
>
> --
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>