ubifs: master area fails to recover when master node 1 is corrupted
Zhihao Cheng
chengzhihao1 at huawei.com
Thu Jan 25 18:20:18 PST 2024
在 2024/1/25 19:48, Ryder Wang 写道:
> Hi,
>
> I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good. It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
>
According to the debug messages below, the mounting failure occurs as
follows:
LEB 1 LEB 2
|mst1 | 0xFF 0xFF ... | |mst2 | 0xFF 0xFF ... |
offset 0 0
* mst1 has bad crc.
ubifs_recover_master_node
get_master_node(UBIFS_MST_LNUM, &mst1)
ubifs_scan_a_node(buf, lnum, offs=0) // SCANNED_A_CORRUPT_NODE
ubifs_check_node // -EUCLEAN, caused by bad crc
if (offs < c->leb_size) // true
if (!is_empty(buf, min_t(int, len, sz))) // true
dbg_rcvry("found corruption at %d:%d")
get_master_node(UBIFS_MST_LNUM + 1, &buf2, &mst2)
ubifs_scan_a_node // SCANNED_A_NODE
*mst = buf // buf = sbuf
buf2 = sbuf
if (mst1) // false
else {
offs2 = (void *)mst2 - buf2; // offs2 = 0
if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node
in LEB 2
goto out_err
}
Above process is one situation recovering master nodes after powercut,
which means that LEB 1 is unmapped and ready to be written the newest
master node, then powercut happens:
ubifs_write_master
lnum = UBIFS_MST_LNUM; // LEB 1
if (offs + UBIFS_MST_NODE_SZ > c->leb_size) // true
err = ubifs_leb_unmap(c, lnum);
>> powercut <<
err = ubifs_write_node_hmac(c->mst_node, lnum)
So master node from LEB 2 can only be recovered in condition that there
is no room left for new master nodes in LEB 2.
Now, the problem is that we corrupt mst1 to construct this situation,
UBIFS identifies that the fact is not the expected situation, UBIFS
refuses to recover master nodes.
> How to reproduce it:
> 1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
> 2. Mount this ubifs.
>
> Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.
Master node is not expected to be OK in this situation. These two master
nodes are not used to recovery in any situations, they are used to find
a valid version of master node. You can refer to following section in [1]:
"The master node stores the position of all on-flash structures ... The
first is that there could be a loss of power at the same instant that
the master node is being written. The second is that there could be
degradation or corruption of the flash media itself. ... In the second
case, recovery is not possible because it cannot be determined reliably
what is a valid master node version."
[1] http://linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
>
> Below is the kernel log of this failure:
>
> ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
> ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
> ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
> UBIFS (ubi0:0): Mounting in unauthenticated mode
> ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
> ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
> ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
> ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
> get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
> magic 0x6101831
> crc 0x3a5c03b2
> node_type 7 (master node)
> group_type 0 (no node group)
> sqnum 9
> len 512
> highest_inum 65
> commit number 0
> flags 0x2
> log_lnum 3
> root_lnum 12
> root_offs 0
> root_len 108
> gc_lnum 11
> ihead_lnum 12
> ihead_offs 4096
> index_size 112
> lpt_lnum 7
> lpt_offs 44
> nhead_lnum 7
> nhead_offs 4096
> ltab_lnum 7
> ltab_offs 57
> lsave_lnum 0
> lsave_offs 0
> lscan_lnum 10
> leb_cnt 13
> empty_lebs 1
> idx_lebs 1
> total_free 753664
> total_dirty 7640
> total_used 440
> total_dead 0
> total_dark 16384
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
>
More information about the linux-mtd
mailing list