UBIFS bug: failure to read NNODE while initializing LPT on mount
Zhihao Cheng
chengzhihao1 at huawei.com
Thu Mar 5 17:18:07 PST 2026
在 2026/3/5 22:16, Azar Gantus 写道:
>
>
> -----Original Message-----
> From: Zhihao Cheng <chengzhihao1 at huawei.com>
> Sent: Friday, February 27, 2026 10:35 AM
> To: Azar Gantus <Azar.Gantus at mobileye.com>; linux-mtd at lists.infradead.org
> Subject: Re: UBIFS bug: failure to read NNODE while initializing LPT on mount
>
> EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
>
> 在 2026/2/27 3:33, Azar Gantus 写道:
>> Hi,
>>
>> I have a problem when mounting a UBIFS volume on a MIPS I6400-based board.
>> The Linux version we are running is 4.19.124.
>> I ran several MTD tests and no issues were found among multiple boards using the same HW/SW.
>>
>> The issue is that during mounting, when initializing the LPT, during ubifs_lpt_lookup, we attempt to read an NNODE
>> And it cannot be read.
> [...]
>>
>> I have not managed to replicate this issue, however, I think it is related to an early garbage collection of an LPT LEB that still contains live nodes.
>> One such flow is this:
>> 1. In ubifs_lpt_start_commit, if (c->check_lpt_free) evalutes to true and we begin clearing space.
> I think the dirty nnodes come from ubifs_lpt_post_commit->lpt_gc (which
> is called from last do_commit()). Because the 'c->check_lpt_free' branch
> is accessed only in the first do_commit(), which can ensure dirty pnodes
> greater than 0.
> And I'm agree on the analysis of the remaining processes.
>> 2. We call on lpt_gc(…) and it selects the victim LEB, LEB X. LEB X contains LSAVEs, LTABs, NNODEs and PNODEs.
>> In this LPT LEB X, only one NNODE is live, and the rest of the nodes are obsolete. We call lpt_gc_num(…, X).
>> This live NNODE is connected to other live NNODEs or PNODEs found on another LPT LEB.
>> 3. We run over every node in LPT LEB X, and since we mark the singular live NNODE as dirty. This does not increment c->dirty_pn_cnt.
>> 4. LPT LEB X gets marked for trivial GC in ubifs_lpt_start_commit -> lpt_gc_start.
>> 5. Since we did not increment c->dirty_pn_cnt, we hit the early return in ubifs_lpt_start_commit,
>> and we don't proceed to do get_cnodes_to_commit, or layout_cnodes.
>> 6. Post commit, we have a live NNODE on a LPT LEB X, despite it having been garbage collected.
>> On the next start up, when initializing the LPT, LPT LEB X is unmapped, but the master still thinks there's a live NNODE at some offset there.
>> Alternatively, it can be allocated again after garbage collection,
>> and we might be pointing to incorrect data (the middle of some other, live data), or in the worst case, an actual live NNODE that is not the original.
>>
>> Is there anything that stops this flow from happening on rare occasions, or can this flow not even happen at all?
>> I would appreciate your help regarding the issue, and the flow described above.
> I have started a program and try to reproduce it, and I think it is hard
> to trigger the problem, it is hard to make dirty pnode be 0 and dirty
> nnode be non-zero:
> diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
> index 5b3a840098b0..505d5fb409d2 100644
> --- a/fs/ubifs/commit.c
> +++ b/fs/ubifs/commit.c
> @@ -105,6 +105,8 @@ static int nothing_to_commit(struct ubifs_info *c)
> * locked. Returns zero in case of success and a negative error code
> in case of
> * failure.
> */
> +#include <linux/delay.h>
> +int g_wait;
> static int do_commit(struct ubifs_info *c)
> {
> int err, new_ltail_lnum, old_ltail_lnum, i;
> @@ -203,6 +205,12 @@ static int do_commit(struct ubifs_info *c)
> if (err)
> goto out;
>
> + if (g_wait) {
> + dump_stack();
> + pr_err("dump corrupted image\n");
> + msleep(86400 * 1000);
> + pr_err("wait done\n");
> + }
> err = ubifs_log_post_commit(c, old_ltail_lnum);
> if (err)
> goto out;
> diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
> index f2cb214581fd..727f7e5e93a0 100644
> --- a/fs/ubifs/lpt_commit.c
> +++ b/fs/ubifs/lpt_commit.c
> @@ -1178,6 +1178,7 @@ static int lpt_gc(struct ubifs_info *c)
> * because they are not part of this commit. This function returns
> zero in case
> * of success and a negative error code in case of failure.
> */
> +extern int g_wait;
> int ubifs_lpt_start_commit(struct ubifs_info *c)
> {
> int err, cnt;
> @@ -1212,6 +1213,10 @@ int ubifs_lpt_start_commit(struct ubifs_info *c)
> lpt_tgc_start(c);
>
> if (!c->dirty_pn_cnt) {
> + if (c->dirty_nn_cnt || c->lpt_drty_flgs) {
> + pr_err("get nn %u flag %u\n", c->dirty_nn_cnt, c->lpt_drty_flgs);
> + g_wait = 1;
> + }
> dbg_cmt("no cnodes to commit");
> err = 0;
> goto out;
> @@ -1315,7 +1320,7 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
> err = lpt_tgc_end(c);
> if (err)
> goto out;
> - if (c->big_lpt)
> + if (c->big_lpt) {
> while (need_write_all(c)) {
> mutex_unlock(&c->lp_mutex);
> err = lpt_gc(c);
> @@ -1323,6 +1328,11 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
> return err;
> mutex_lock(&c->lp_mutex);
> }
> + pr_info("%d %d %d\n", c->dirty_nn_cnt, c->dirty_pn_cnt,
> c->lpt_drty_flgs);
> + if ((c->dirty_nn_cnt || c->lpt_drty_flgs) && !c->dirty_pn_cnt) {
> + pr_err("only gc non-pnode\n");
> + }
> + }
> out:
> mutex_unlock(&c->lp_mutex);
> return err;
>
> [root at localhost ~]# cat test.sh
> #!/bin/bash
>
> pkill fsstress > /dev/null 2>&1
> TMP=/root/temp
> umount $TMP 2>/dev/null || true
> mkdir -p $TMP
>
> modprobe -r ubifs 2>/dev/null || true
> for i in $(seq 0 1)
> do
> ubidetach -p /dev/mtd$i 2>/dev/null || true
> done
> modprobe -r ubi 2>/dev/null || true
> modprobe -r nandsim 2>/dev/null || true
>
> mtd=/dev/mtd0
> ubi=/dev/ubi0
>
> ID="0x20,0x78,0x00,0x00" # 128MB (16KB PEB, 512B page)
>
> modprobe nandsim id_bytes=$ID
> flash_eraseall /dev/mtd0
>
>
> modprobe ubi mtd="0,512"
> ubimkvol -N vol_a -m -n 0 /dev/ubi0
> modprobe ubifs
> mount -t ubifs /dev/ubi0_0 $TMP
> while true
> do
> per=`df -Th | grep ubifs | awk '{print $6}'`;
> if [[ ${per%?} -gt 95 ]]; then
> rm -rf "$TMP/p$((RANDOM % 5))"
> rm -rf "$TMP/p$((RANDOM % 5))"
> fi
> fsstress -d $TMP -l0 -p4 -n10000 &
> sleep $((RANDOM % 5))
> ps -e | grep -w fsstress > /dev/null 2>&1
> while [ $? -eq 0 ]
> do
> pkill fsstress > /dev/null 2>&1
> sleep 1
> ps -e | grep -w fsstress > /dev/null 2>&1
> done
> sync &
> sleep 1
> sync &
> msg=`dmesg | grep "dump corrupted image"`;
> if [[ "$msg" != "" ]]
> then
> echo $msg
> break
> fi
> done
>
> dd if=$mtd of=disk bs=1M
> ubidetach -m0
> flash_eraseall $mtd
> nandwrite $mtd disk > /dev/null
> ubiattach -m0 -O512
>>
>> Thanks,
>> Azar
>>
>>
>> ______________________________________________________
>> Linux MTD discussion mailing list
>> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>>
>
> Hi Zhao,
>
> First of all thanks for the response.
> I took your script and ran it for approximately 2 days and did not see any replication of the issue. I am uncertain that it is possible to replicate it with your methodology, as we want the a sync without any PNODE changes, which seems unlikely to happen after the fsstress threads run. Am I wrong?
Me too, I have run it for 4 days and get no replications.
The fsstress will update the pnodes certainly. UBIFS won't update nnodes
only except for the lpt_gc(called by ubifs_lpt_post_commit), so we have
generate rounds of commiting process to trigger it, the ubifs may gc one
nnodes-only leb if we are lucky enough.
>
>
More information about the linux-mtd
mailing list