UBIFS bug: failure to read NNODE while initializing LPT on mount

Thu Mar 5 17:18:07 PST 2026

在 2026/3/5 22:16, Azar Gantus 写道:
> 
> 
> -----Original Message-----
> From: Zhihao Cheng <chengzhihao1 at huawei.com>
> Sent: Friday, February 27, 2026 10:35 AM
> To: Azar Gantus <Azar.Gantus at mobileye.com>; linux-mtd at lists.infradead.org
> Subject: Re: UBIFS bug: failure to read NNODE while initializing LPT on mount
> 
> EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
> 
> 在 2026/2/27 3:33, Azar Gantus 写道:
>> Hi,
>>
>> I have a problem when mounting a UBIFS volume on a MIPS I6400-based board.
>> The Linux version we are running is 4.19.124.
>> I ran several MTD tests and no issues were found among multiple boards using the same HW/SW.
>>
>> The issue is that during mounting, when initializing the LPT, during ubifs_lpt_lookup, we attempt to read an NNODE
>> And it cannot be read.
> [...]
>>
>> I have not managed to replicate this issue, however, I think it is related to an early garbage collection of an LPT LEB that still contains live nodes.
>> One such flow is this:
>> 1. In ubifs_lpt_start_commit,  if (c->check_lpt_free) evalutes to true and we begin clearing space.
> I think the dirty nnodes come from ubifs_lpt_post_commit->lpt_gc (which
> is called from last do_commit()). Because the 'c->check_lpt_free' branch
> is accessed only in the first do_commit(), which can ensure dirty pnodes
> greater than 0.
> And I'm agree on the analysis of the remaining processes.
>> 2. We call on lpt_gc(…) and it selects the victim LEB, LEB X. LEB X contains LSAVEs, LTABs, NNODEs and PNODEs.
>> In this LPT LEB X, only one NNODE is live, and the rest of the nodes are obsolete. We call lpt_gc_num(…, X).
>> This live NNODE is connected to other live NNODEs or PNODEs found on another LPT LEB.
>> 3. We run over every node in LPT LEB X, and since we mark the singular live NNODE as dirty. This does not increment c->dirty_pn_cnt.
>> 4. LPT LEB X gets marked for trivial GC in ubifs_lpt_start_commit -> lpt_gc_start.
>> 5. Since we did not increment c->dirty_pn_cnt, we hit the early return in ubifs_lpt_start_commit,
>> and we don't proceed to do get_cnodes_to_commit, or layout_cnodes.
>> 6. Post commit, we have a live NNODE on a LPT LEB X, despite it having been garbage collected.
>> On the next start up, when initializing the LPT, LPT LEB X is unmapped, but the master still thinks there's a live NNODE at some offset there.
>> Alternatively, it can be allocated again after garbage collection,
>> and we might be pointing to incorrect data (the middle of some other, live data), or in the worst case, an actual live NNODE that is not the original.
>>
>> Is there anything that stops this flow from happening on rare occasions, or can this flow not even happen at all?
>> I would appreciate your help regarding the issue, and the flow described above.
> I have started a program and try to reproduce it, and I think it is hard
> to trigger the problem, it is hard to make dirty pnode be 0 and dirty
> nnode be non-zero:
> diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
> index 5b3a840098b0..505d5fb409d2 100644
> --- a/fs/ubifs/commit.c
> +++ b/fs/ubifs/commit.c
> @@ -105,6 +105,8 @@ static int nothing_to_commit(struct ubifs_info *c)
>     * locked. Returns zero in case of success and a negative error code
> in case of
>     * failure.
>     */
> +#include <linux/delay.h>
> +int g_wait;
>    static int do_commit(struct ubifs_info *c)
>    {
>    	int err, new_ltail_lnum, old_ltail_lnum, i;
> @@ -203,6 +205,12 @@ static int do_commit(struct ubifs_info *c)
>    	if (err)
>    		goto out;
> 
> +	if (g_wait) {
> +		dump_stack();
> +		pr_err("dump corrupted image\n");
> +		msleep(86400 * 1000);
> +		pr_err("wait done\n");
> +	}
>    	err = ubifs_log_post_commit(c, old_ltail_lnum);
>    	if (err)
>    		goto out;
> diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
> index f2cb214581fd..727f7e5e93a0 100644
> --- a/fs/ubifs/lpt_commit.c
> +++ b/fs/ubifs/lpt_commit.c
> @@ -1178,6 +1178,7 @@ static int lpt_gc(struct ubifs_info *c)
>     * because they are not part of this commit. This function returns
> zero in case
>     * of success and a negative error code in case of failure.
>     */
> +extern int g_wait;
>    int ubifs_lpt_start_commit(struct ubifs_info *c)
>    {
>    	int err, cnt;
> @@ -1212,6 +1213,10 @@ int ubifs_lpt_start_commit(struct ubifs_info *c)
>    	lpt_tgc_start(c);
> 
>    	if (!c->dirty_pn_cnt) {
> +		if (c->dirty_nn_cnt || c->lpt_drty_flgs) {
> +			pr_err("get nn %u flag %u\n", c->dirty_nn_cnt, c->lpt_drty_flgs);
> +			g_wait = 1;
> +		}
>    		dbg_cmt("no cnodes to commit");
>    		err = 0;
>    		goto out;
> @@ -1315,7 +1320,7 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
>    	err = lpt_tgc_end(c);
>    	if (err)
>    		goto out;
> -	if (c->big_lpt)
> +	if (c->big_lpt) {
>    		while (need_write_all(c)) {
>    			mutex_unlock(&c->lp_mutex);
>    			err = lpt_gc(c);
> @@ -1323,6 +1328,11 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
>    				return err;
>    			mutex_lock(&c->lp_mutex);
>    		}
> +		pr_info("%d %d %d\n", c->dirty_nn_cnt, c->dirty_pn_cnt,
> c->lpt_drty_flgs);
> +		if ((c->dirty_nn_cnt || c->lpt_drty_flgs) && !c->dirty_pn_cnt) {
> +			pr_err("only gc non-pnode\n");
> +		}
> +	}
>    out:
>    	mutex_unlock(&c->lp_mutex);
>    	return err;
> 
> [root at localhost ~]# cat test.sh
> #!/bin/bash
> 
> pkill fsstress > /dev/null 2>&1
> TMP=/root/temp
> umount $TMP 2>/dev/null || true
> mkdir -p $TMP
> 
> modprobe -r ubifs 2>/dev/null || true
> for i in $(seq 0 1)
> do
> 	ubidetach -p /dev/mtd$i 2>/dev/null || true
> done
> modprobe -r ubi 2>/dev/null || true
> modprobe -r nandsim 2>/dev/null || true
> 
> mtd=/dev/mtd0
> ubi=/dev/ubi0
> 
> ID="0x20,0x78,0x00,0x00" # 128MB (16KB PEB, 512B page)
> 
> modprobe nandsim id_bytes=$ID
> flash_eraseall /dev/mtd0
> 
> 
> 	modprobe ubi mtd="0,512"
> 	ubimkvol -N vol_a -m -n 0 /dev/ubi0
> 	modprobe ubifs
> 	mount -t ubifs /dev/ubi0_0 $TMP
> 	while true
> 	do
> 		per=`df -Th | grep ubifs | awk '{print $6}'`;
> 		if [[ ${per%?} -gt 95 ]]; then
> 			rm -rf "$TMP/p$((RANDOM % 5))"
> 			rm -rf "$TMP/p$((RANDOM % 5))"
> 		fi
> 		fsstress -d $TMP -l0 -p4 -n10000 &
> 		sleep $((RANDOM % 5))
> 		ps -e | grep -w fsstress > /dev/null 2>&1
> 		while [ $? -eq 0 ]
> 		do
> 			pkill fsstress > /dev/null 2>&1
> 			sleep 1
> 			ps -e | grep -w fsstress > /dev/null 2>&1
> 		done
> 		sync &
> 		sleep 1
> 		sync &
> 		msg=`dmesg | grep "dump corrupted image"`;
> 		if [[ "$msg" != "" ]]
> 		then
> 			echo $msg
> 			break
> 		fi
> 	done
> 
> 	dd if=$mtd of=disk bs=1M
> 	ubidetach -m0
> 	flash_eraseall $mtd
> 	nandwrite $mtd disk > /dev/null
> 	ubiattach -m0 -O512
>>
>> Thanks,
>> Azar
>>
>>
>> ______________________________________________________
>> Linux MTD discussion mailing list
>> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>>
> 
> Hi Zhao,
> 
> First of all thanks for the response.
> I took your script and ran it for approximately 2 days and did not see any replication of the issue. I am uncertain that it is possible to replicate it with your methodology, as we want the a sync without any PNODE changes, which seems unlikely to happen after the fsstress threads run. Am I wrong?

Me too, I have run it for 4 days and get no replications.
The fsstress will update the pnodes certainly. UBIFS won't update nnodes 
only except for the lpt_gc(called by ubifs_lpt_post_commit), so we have 
generate rounds of commiting process to trigger it, the ubifs may gc one 
nnodes-only leb if we are lucky enough.
> 
>