[RFC] UBIFS recovery

Ricard Wanderlof ricard.wanderlof at axis.com
Mon Feb 9 04:12:27 PST 2015


On Mon, 9 Feb 2015, hujianyang wrote:

> Hi Artem and Richard,
> 
> On 2015/2/9 15:57, Richard Weinberger wrote:
> > Am 09.02.2015 um 08:51 schrieb Artem Bityutskiy:
> >> On Mon, 2015-02-09 at 10:34 +0800, hujianyang wrote:
> >>> Good suggestions. I will try to realize periodically commit first. But I
> >>> don't know if this feature is really needed. Switch to R/O and revert to
> >>> last comitted state? But we just consider about log before, never think
> >>> about index.
> >>
> >> I think the right way to approach this problem is to come up with a high
> >> level summary of the problems we are trying to solve, and the solutions,
> >> along with some analysis of the solutions. This does not have to be very
> >> detailed, but it should put everyone involved into the same page.
> > 
> > Agreed. I fear we're talking about different things. :)
> > 
> 
> I'm afraid I didn't express the use case of the corruption recovery feature.
> UBIFS is used mostly in embedded environment. After products selling out,
> it's hard to debug it. So the production team may consider any failure that
> could happen and put the recovery method into their operation scripts/utilities.
> 
> Flash corruption is a problem they need to care about. Using high quality
> cell is not enough, ECC error could not be avoid. So a recovery method which
> is provided by filesystem itself is required.

Isn't this a bit backward? Given a certain acceptable failure rate for a 
product, select an appropriate flash chip in combination with a reasonable 
amount of ECC to get a medium that has a low enough error rate so that 
higher levels do not need to concern themselves. If a high level of 
reliability is needed, then some other form of nonvolatile storage should 
be selected.

The only high level function should be some sort of periodic scrubbing of 
NAND flash blocks to ensure the error rate does not rise too fast 
unnoticed.

Having UBIFS manage random corruptions would seem hopeful at best, if some 
critical file is corrupted then the system can't start anyway.

In any system all components have a failure rate, so it's a question of 
getting the failure rate of the NAND subsystem on par with the failure 
rate of other components. Just because there is a theoretical possibility 
of fixing an UBIFS problem does not really make the system more reliable 
per se. What if you get a fault in a RAM chip? The CPU? The PSU? In all 
those cases the product will be simply "broken", and we can handle 
defective flash the same way. A transistor in the PSU blew or the NAND 
flash happened to be the the one-in-a-million part that keeps loosing 
bits. Same result, product dead, repair or replace it.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30



More information about the linux-mtd mailing list