[RFC] UBIFS recovery

Fri Feb 6 09:21:10 PST 2015

On Thu, 2015-02-05 at 21:09 +0800, shengyong wrote:
> No matter how fs is recovered, data is corrupted. For the default recovery
> machanism, the recovery just drops the last node, and the lost data can be
> limited in the mininum range. For other situations, like data corrupted in
> the middle of log area, it may be hard to figure out which nodes should be
> droped. So we'd prefer to roll the whole fs back to the last checkpoint,
> rather than losing all data.

So are you focused on log corruptions only? Why is this case important
for you?

> Here is a simple recovery procedure, something could be easily missed in the
> procedure:
> 1. if the default recovery fails, we start to roll the whole filesystem
>    back to the last checkpoint.

Lets use word "commit" instead, just for clarity.

> 2. scan all buds already in replay_buds list, if last commit in the bud starts
>    from the begining of the LEB, then all nodes in the bud are new, and we
>    unmap it; if last commit starts in the middle of the bud, we leb_change the
>    bud, keep old nodes and drop new nodes.

I do not really understand this. A bud is an ucommitted LEB, the journal
consists of buds. The log contains references to the buds, plus commit
start/end nodes.

Also, do you realize that if I fsync() a file, it does not mean a
commit, it just means write all data to the journal.

Do you suggest to just erase the entire journal LEBs which contain
pieces of a file I fsync()'ed?

We really need to step back, think, and come with a good English
description of the specific problem we are trying to solve here.

> BTW, the current ubifs will update master node when mounting, no matter whether
> the mount succeeds or fails. So if need_recovery is detected, the master node
> should not be updated.

This sounds like a bug!