RFC: detect and manage power cut on MLC NAND

Tue Mar 17 01:00:39 PDT 2015

Dear Jeff,

Il 16/03/2015 17:02, Jeff Lauruhn (jlauruhn) ha scritto:
> "What if it is set low while NAND is ready - that is it can accept new commands - but an erase operation is in progress?".  Again, the likelihood is low, in this case a block that was intended to be erased my not be fully erase, it can be fully erased when power returns.  No data will lost because the block was intended to be erased anyway. 

AFAIK the real problem here is how to detect that the in-progress erase
has not been terminated completely
The block seems to be erased (it contains all 0xFF) but it can lead to
some failure when writing or reading
(I have some indirect experience in it, when doing some power-cut test
on MLC but I cannot say for sure that it was caused by not completed
erase operation. See also this reference
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits)

> "Does the erase operation complete anyway?"  There has been a lot of work done to mitigate power loss on NAND, but I haven't ever seen a design from any NAND vendor that is 100%.  Surprise power loss should be avoided on NAND or consider power detection and elegant shutdown circuitry.

IIUC we have two workaround here:
- power detection and clean shutdown. This implies, of course, some
hardware implementation and a piece of software able to intercept the
hardware event, block all the NAND operation apart the one that is
currently running. This is the topic of my initial RFC. I got a first
implementation but, IMHO, is too strictly related to the NAND controller
(I've placed the event handler inside it and just lock the NAND access
by hacking chip->select_chip())
I think that power fail detection can be useful in some other context
too inside Linux kernel, so probably it may have a more general
implementation. However (and unfortunately) I'm not so involved in
mainline kernel so I don't really know if this topic has been discussed
or not

- detection of interrupted operations: having some kind of journal that
record the last running operation and fix it on next reboot. AFAIK this
is what commercial FTL and flash memory controller do, using a lot of
patented piece of software.
I'm not so aware of UBI/MTD internals but I think this is hard to
implement using a general approach. It should be easier to implement
such a logic having some additional hardware support (the first that
come into my mind is battery backed SRAM, e.g. the one that we find
inside RTC devices)

Best Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems