UBIFS: file data corruption during the power cut-off test

Steve deRosier derosier at gmail.com
Sun Jun 9 08:25:17 PDT 2019


Hi Sergei,

On Sun, Jun 9, 2019 at 1:32 AM Sergei Poselenov <sposelenov at emcraft.com> wrote:
>
> Hello Steve,
>
> Please see my comment below.
>

> > If I had to continue my guessing - the valid portion of the file
> > test2
> > that was successfully written is not a multiple of your NAND's page
> > size.  Likely you've got 2Kb pages with 4 512 byte subpages.  The
> > last
> > page of that flash that was written for that file wrote three of the
> > four subpages.  When you `dd` the file overwrite the existing file,
> Looks like you are right, what I'm seeing is that only 3 of 4 512-bytes
> subpages written correctly.
>
> So, you are saying that the NAND controller (or the kernel device
> driver?) returned "success" for the "4K page write" operation, while
> that wasn't actually true?
>
>

No, that's not what I'm saying.  I'm saying the NAND was written
exactly as you specified.  You basically said: "write these bytes from
this page, ignoring the fact that the space I'm writing to is too
long". Flash gets erased and written in pages and subpages
(respectively). My suspicion is that your test case itself causes the
issue.  UBIFS, nor any filesystem will protect a file from getting
corrupted when the power goes out when you write it. It also can't
protect you from purposely corrupting a file with you test-case. The
purpose of UBIFS's power-cut tolerance is to be sure the filesystem
itself doesn't corrupt and can still boot.

While I'm still pretty sure your test case is the cause of the
corruption itself, I tried the basics (minus the power cut) to test a
part of my theory. Namely the dd of the partial page with notrunc is
the source of your problem.  On my platform, on a single test, I
couldn't replicate what I think is happening.  However, that's hardly
conclusive because I _know_ I'm using different hardware and software
stack than you.

So, let's start back at the basics:

* What is your processor hardware? You said "based on i.MX 6ULL", but
let's be 100% specific.
* What is your NAND chip?
* What is the layout of the NAND chip (sectors, pages, subpages)?
* What ECC level are you using?
* What version of Linux are you using (all three x.y.z, preferably
with a reference to the actual git branch you're using, vendor or
stock)
* What NAND controller driver are you using?
* What NAND chip driver are you using?

And finally:
* Do you see overall UBIFS corruption? In other words, when the device
boots, do you see it unable to correct a problem caused by the power
cut?

Read the link Richard sent you. The basic rule is: "...applications
should not assume anything about the contents of files which were not
synchronized before a power-cut has happened. " Having -osync on
doesn't mean your file was synced if the power-cut comes in the middle
of a write. It just means that the OS is going to do the sync
automatically (per the semantics) so you don't have to issue a sync
command.

I think UBIFS is behaving as designed and intended and your test-case
and expectations are flawed.  However, please give the asked for
details, actual logs and data dumps and if there's a bug here it'll
get looked at.

- Steve



More information about the linux-mtd mailing list