UBIFS failed to recover master node

Mon Jun 28 04:21:10 EDT 2010

 Am 25.05.2010 07:06, schrieb Artem Bityutskiy:
> On Mon, 2010-05-24 at 11:22 -0400, twebb wrote:
>> I've had several cases where our MLC NAND flash appears corrupted in
>> such a way that one of three UBIFS volumes can not be mounted due to
>> "failed to recover master node".  I haven't been able to reproduce the
>> problem, but we've had at least 5 incidents where this has occurred.
>> (A partial capture from one of the failures is below.)
>>
>> I'm starting to investigate this problem and don't know if this is a
>> UBIFS/UBI problem or a NAND driver problem.  I'm starting the process
>> of back-porting the latest UBIFS code to our 2.6.29 kernel - hoping
>> that new UBIFS code will solve the problem.  However, this may also be
>> a driver problem and I wonder if I also need to update that driver
>> (pxa3xx_nand).  Any suggestions for debugging this problem?
>>
>> Thanks,
>> twebb
>>
>>
>> capture:
>> [root at ESIedge mtd-utils]# mount -t ubifs ubi0_0 /mnt/
>> [  239.605869] UBI error: ubi_io_read: error -74 while reading 516096
>> bytes from PEB 4:8192, read 516096 bytes
>> [  239.616317] UBIFS error (pid 676): ubifs_scan: corrupt empty space
>> at LEB 2:268135
>> [  239.623996] UBIFS error (pid 676): ubifs_scanned_corruption:
>> corruption at LEB 2:268135
>> [  239.642101] UBIFS error (pid 676): ubifs_scan: LEB 2 scanning failed
>> [  239.976396] UBI error: ubi_io_read: error -74 while reading 516096
>> bytes from PEB 4:8192, read 516096 bytes
>> [  239.986742] UBIFS error (pid 676): ubifs_recover_master_node:
>> failed to recover master node
>> mount: mounting ubi0_0 on /mnt/ failed: Invalid argument
> And BTW, it is a good idea not to erase/re-flash this device if you want
> to fix this problem.
>
Our power off tests causes this sporadic error too  (ubifs_recover_master_node: failed
to recover master node).
We use kernel 2.6.29 with the git-patch (from 3/2010) for 47MB NOR flash partition.

I tried to find with debugging  the error reason.
The recover of the master_node reads the master_node1 and master_node2.
The master_node1 was emty.
The error was detected in:
int ubifs_recover_master_node(struct ubifs_info *c)
    ....
    if (mst1) {
       ......
    } else {
        if (!mst2)
            goto out_err;          
        /* 1st LEB was unmapped and about to be written, so there must
         * be no room left in 2nd LEB.         */
        offs2 = (void *)mst2 - buf2;
        if (offs2 + sz + sz <= c->leb_size)
            goto out_err;                               !!!!!!!!!!!!!!!!!!!
        mst = mst2;
    }
I checked the values of the compare "if (115712 + 512 +512  (=116736) <= 130944)".
I skipped this error for test purpose. The master_node was recovered. I saw no problems
with the FS. I was not able to follow this check.

I was able to provoke this error manual.
My UBIFS use LEB:1 for the first master_node and LEB:2 for the second.
I searched the LEB:1 and deleted this sector.
The following loading and mounting causes the error.
A ignoring of the error causes a successful recovery.
I used 15 MB and 47 MB NOR flash partitions for this tries.
The 15MB partition flash checks the error in the compare "if (9216 + 512 +512  (=10240)
<= 130944)",
These values are independent to the PEB of LEB:1 and LEB:2 and independent to the free
space of the FS.

Regards
Reinhold