UBIFS bug: failure to read NNODE while initializing LPT on mount

Thu Mar 5 06:16:41 PST 2026

-----Original Message-----
From: Zhihao Cheng <chengzhihao1 at huawei.com> 
Sent: Friday, February 27, 2026 10:35 AM
To: Azar Gantus <Azar.Gantus at mobileye.com>; linux-mtd at lists.infradead.org
Subject: Re: UBIFS bug: failure to read NNODE while initializing LPT on mount

EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.

在 2026/2/27 3:33, Azar Gantus 写道:
> Hi,
> 
> I have a problem when mounting a UBIFS volume on a MIPS I6400-based board.
> The Linux version we are running is 4.19.124.
> I ran several MTD tests and no issues were found among multiple boards using the same HW/SW.
> 
> The issue is that during mounting, when initializing the LPT, during ubifs_lpt_lookup, we attempt to read an NNODE
> And it cannot be read.
[...]
> 
> I have not managed to replicate this issue, however, I think it is related to an early garbage collection of an LPT LEB that still contains live nodes.
> One such flow is this:
> 1. In ubifs_lpt_start_commit,  if (c->check_lpt_free) evalutes to true and we begin clearing space.
I think the dirty nnodes come from ubifs_lpt_post_commit->lpt_gc (which 
is called from last do_commit()). Because the 'c->check_lpt_free' branch 
is accessed only in the first do_commit(), which can ensure dirty pnodes 
greater than 0.
And I'm agree on the analysis of the remaining processes.
> 2. We call on lpt_gc(…) and it selects the victim LEB, LEB X. LEB X contains LSAVEs, LTABs, NNODEs and PNODEs.
> In this LPT LEB X, only one NNODE is live, and the rest of the nodes are obsolete. We call lpt_gc_num(…, X).
> This live NNODE is connected to other live NNODEs or PNODEs found on another LPT LEB.
> 3. We run over every node in LPT LEB X, and since we mark the singular live NNODE as dirty. This does not increment c->dirty_pn_cnt.
> 4. LPT LEB X gets marked for trivial GC in ubifs_lpt_start_commit -> lpt_gc_start.
> 5. Since we did not increment c->dirty_pn_cnt, we hit the early return in ubifs_lpt_start_commit,
> and we don't proceed to do get_cnodes_to_commit, or layout_cnodes.
> 6. Post commit, we have a live NNODE on a LPT LEB X, despite it having been garbage collected.
> On the next start up, when initializing the LPT, LPT LEB X is unmapped, but the master still thinks there's a live NNODE at some offset there.
> Alternatively, it can be allocated again after garbage collection,
> and we might be pointing to incorrect data (the middle of some other, live data), or in the worst case, an actual live NNODE that is not the original.
> 
> Is there anything that stops this flow from happening on rare occasions, or can this flow not even happen at all?
> I would appreciate your help regarding the issue, and the flow described above.
I have started a program and try to reproduce it, and I think it is hard 
to trigger the problem, it is hard to make dirty pnode be 0 and dirty 
nnode be non-zero:

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 5b3a840098b0..505d5fb409d2 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -105,6 +105,8 @@ static int nothing_to_commit(struct ubifs_info *c)
   * locked. Returns zero in case of success and a negative error code 
in case of
   * failure.
   */
+#include <linux/delay.h>
+int g_wait;
  static int do_commit(struct ubifs_info *c)
  {
  	int err, new_ltail_lnum, old_ltail_lnum, i;
@@ -203,6 +205,12 @@ static int do_commit(struct ubifs_info *c)
  	if (err)
  		goto out;

+	if (g_wait) {
+		dump_stack();
+		pr_err("dump corrupted image\n");
+		msleep(86400 * 1000);
+		pr_err("wait done\n");
+	}
  	err = ubifs_log_post_commit(c, old_ltail_lnum);
  	if (err)
  		goto out;
diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
index f2cb214581fd..727f7e5e93a0 100644
--- a/fs/ubifs/lpt_commit.c
+++ b/fs/ubifs/lpt_commit.c
@@ -1178,6 +1178,7 @@ static int lpt_gc(struct ubifs_info *c)
   * because they are not part of this commit. This function returns 
zero in case
   * of success and a negative error code in case of failure.
   */
+extern int g_wait;
  int ubifs_lpt_start_commit(struct ubifs_info *c)
  {
  	int err, cnt;
@@ -1212,6 +1213,10 @@ int ubifs_lpt_start_commit(struct ubifs_info *c)
  	lpt_tgc_start(c);

  	if (!c->dirty_pn_cnt) {
+		if (c->dirty_nn_cnt || c->lpt_drty_flgs) {
+			pr_err("get nn %u flag %u\n", c->dirty_nn_cnt, c->lpt_drty_flgs);
+			g_wait = 1;
+		}
  		dbg_cmt("no cnodes to commit");
  		err = 0;
  		goto out;
@@ -1315,7 +1320,7 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
  	err = lpt_tgc_end(c);
  	if (err)
  		goto out;
-	if (c->big_lpt)
+	if (c->big_lpt) {
  		while (need_write_all(c)) {
  			mutex_unlock(&c->lp_mutex);
  			err = lpt_gc(c);
@@ -1323,6 +1328,11 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
  				return err;
  			mutex_lock(&c->lp_mutex);
  		}
+		pr_info("%d %d %d\n", c->dirty_nn_cnt, c->dirty_pn_cnt, 
c->lpt_drty_flgs);
+		if ((c->dirty_nn_cnt || c->lpt_drty_flgs) && !c->dirty_pn_cnt) {
+			pr_err("only gc non-pnode\n");
+		}
+	}
  out:
  	mutex_unlock(&c->lp_mutex);
  	return err;

[root at localhost ~]# cat test.sh
#!/bin/bash

pkill fsstress > /dev/null 2>&1
TMP=/root/temp
umount $TMP 2>/dev/null || true
mkdir -p $TMP

modprobe -r ubifs 2>/dev/null || true
for i in $(seq 0 1)
do
	ubidetach -p /dev/mtd$i 2>/dev/null || true
done
modprobe -r ubi 2>/dev/null || true
modprobe -r nandsim 2>/dev/null || true

mtd=/dev/mtd0
ubi=/dev/ubi0

ID="0x20,0x78,0x00,0x00" # 128MB (16KB PEB, 512B page)

modprobe nandsim id_bytes=$ID
flash_eraseall /dev/mtd0


	modprobe ubi mtd="0,512"
	ubimkvol -N vol_a -m -n 0 /dev/ubi0
	modprobe ubifs
	mount -t ubifs /dev/ubi0_0 $TMP
	while true
	do
		per=`df -Th | grep ubifs | awk '{print $6}'`;
		if [[ ${per%?} -gt 95 ]]; then
			rm -rf "$TMP/p$((RANDOM % 5))"
			rm -rf "$TMP/p$((RANDOM % 5))"
		fi
		fsstress -d $TMP -l0 -p4 -n10000 &
		sleep $((RANDOM % 5))
		ps -e | grep -w fsstress > /dev/null 2>&1
		while [ $? -eq 0 ]
		do
			pkill fsstress > /dev/null 2>&1
			sleep 1
			ps -e | grep -w fsstress > /dev/null 2>&1
		done
		sync &
		sleep 1
		sync &
		msg=`dmesg | grep "dump corrupted image"`;
		if [[ "$msg" != "" ]]
		then
			echo $msg
			break
		fi
	done

	dd if=$mtd of=disk bs=1M
	ubidetach -m0
	flash_eraseall $mtd
	nandwrite $mtd disk > /dev/null
	ubiattach -m0 -O512
> 
> Thanks,
> Azar
> 
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> 

Hi Zhao,

First of all thanks for the response.
I took your script and ran it for approximately 2 days and did not see any replication of the issue. I am uncertain that it is possible to replicate it with your methodology, as we want the a sync without any PNODE changes, which seems unlikely to happen after the fsstress threads run. Am I wrong?