UBIFS Corrupt during power failure

Fri Apr 17 04:56:33 EDT 2009

Jamie, thanks for feedback!

On Thu, 2009-04-16 at 22:34 +0100, Jamie Lokier wrote:
> > 1. eraseblock
> > 2. Min. I/O unit size, which is mtd->writesize in MTD, and
> > ubi->min_io_size in UBI. This corresponds to NAND page, and 1 byte in
> > NOR.
> 
> I guess 1 byte in NOR because you can overwrite a word to set the other byte?
> Logically min_io_size should be 1 bit :-)
> 
> > 3. There are also sub-pages in case of NAND, but I consider them as a
> > kind of hack. UBI does not expose information about them, and UBIFS does
> > not use them.
> 
> UBI FAQ (http://www.linux-mtd.infradead.org/faq/ubi.html#L_find_min_io_size)
> 
>     UBI: physical eraseblock size:   131072 bytes (128 KiB)
>     UBI: logical eraseblock size:    129024 bytes
>     UBI: smallest flash I/O unit:    2048
>     UBI: sub-page size:              512
> 
>     Note, if sup-page size was not printed, the flash is not NAND
>     flash or NAND flash which does not have sub-pages.
> 
> UBI does not expose information about sub-pages?

It prints about them, just for info. But the UBI "front-ent" API
does not contain sub-page info.

> Googling for "NAND sub-page" didn't help explain them much.  Can you
> recommend a URL, just so I can understand NAND sub-pages?

There is info at the MTD web site. But for your convenience, I've
also added this:

http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage

> > Now obviously, we need to extend this model. I would suggest to
> > introduce a notion of "max. I/O size". It would be:
> > 
> > 1. 64-bytes in case of Eric's NOR. This would be taken from CFI info.
> > 2. If we ever have a striping layer, which can interleave between 2 or
> >    more chips, then max. I/O size will be N * ubi->min_io_size.
> > 
> > Thoughts?
> 
> 0. It's more accurate to call it "max parallel write size".
>    That NOR chip has a read page size too, which is different :-)
> 
> 1. Alignment, or can we assume alignment is the same as its size?

Yes, I think so.

> 2. If striping uses larger stripes (the same way as RAID uses 1MB
>    stripes instead of 1 sector stripes), then this value needs to be
>    max(N * strip_size, N * ubi->min_io_size), because the chip block
>    writes done in parallel are not contiguous in the combined MTD.

OK.

> 
> 3. 2 assumes that striping works like this:
> 
>        Start write at offset P to chip 0, chip 1, chip 2, chip 3.
>        Wait for _all_ chips to finish.
>        Start write at offset P+block_size to chip 0, chip 1, chip 2, chip 3.
>        Wait for _all_ chips to finish.
>        etc.

Right.

>    But if striping is implemented in a more relaxed way to get higher
>    performance, it will do this:
> 
>        Start write at offset P to chip 0, chip 1, chip 2, chip 3.
>        Wait for any chip to finish.
>        Start write at offset P+block_size on the chip which finished.
>        Wait for any chip to finish.
>        Start write at next block on the chip which finished.
>        Wait for any chip to finish.
>        etc.

Yeah...

>    That makes the range of parallel writes, and so
>    corruption-on-power-loss, even larger than max(N * strip_size, N *
>    block_size).  The range is as large as the whole write, if one chip
>    is writing much faster than the others, so it cannot be represented
>    by a small number.

Then I guess we should just introduce mtd->max_corruption ? This would
mean maximum amount of bytes corruption may span in vase of power cuts?

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)