[RFC] UBIFS recovery

Mon Feb 9 03:18:51 PST 2015

On Mon, 2015-02-09 at 18:38 +0800, hujianyang wrote:
> I think mount R/O is a good beginning. We don't need consider much about how
> to recover but can provide a usable(in some cases) file-system. And a R/O
> mount means we could do some cleanup to revert to this R/O state. This R/O
> mount should be provided by driver itself without any userspace tools.

I guess if we decompose the problem this way it will also be helpful (to
you and the readers).

1. There are types of corruptions when UBIFS mounts the file-system just
fine. For example, a committed data node is currupted. You will only
notice this when you read the corresponding file, and this is the point
when the file-system becomes read-only.

2. There are types of corruptions when UBIFS refuses to mount. These are
related to the replay process. Whenever there is a corrupted node which
does not look like a result of power-cut, UBIFS refuses to mount.

It appears to me that you are after nailing down the problem #2. You
want UBIFS to still mount the FS, and stay R/O. Is this correct?

I would like you to consider problem #1 too. Consider cases like: a data
node is corrupted, an inode is corrupted (both directory and
non-directory), a dentry is corrupted, an index node is corrupted, an
LPT are is corrupted.

What happens in each of these cases? Are you OK with that or you'd like
to change that? What the product team does in these cases?

You do not have to answer these questions in this e-mail. You can, but
these are mostly for you, so that you see the bigger picture.

Now, regarding problem #2.

There are multiple cases here too: master nodes are corrupted, a
corruption in the log, and corruption in the journal (buds), a
corruption in the LPT area, a corruption in the index.

I'd like you to think about all these cases. Again, just for yourself,
to understand the broader picture.

It looks like you are focusing on corruptions in buds, right? Is it
because this is the most probable situation, or is this something which
show problems in the field/testing?

You suggest that in case of a corrupted bud, you just try to go back to
the previous commited state.

This sounds rational to me. As I described, though, the problem is that
'fsync()' does not mean 'commit'. So what this means is that, say, mysql
fsync()'s its database, and believes it is now on the media. But then
there is a problem in the journal, in some LEB which is not related to
the fsync()'ed mysql database at all, and you drop the database changes.

So the better thing to do is to try dropping just the corrupted nodes,
not the entire journal. It does not sound too hard - you just keep
scanning and skip corrupted nodes. Replay as usual. Just mark the FS as
R/O if corruptions were not power-cut-related.