UBIFS bug: failure to read NNODE while initializing LPT on mount
Azar Gantus
Azar.Gantus at mobileye.com
Thu Mar 5 06:16:41 PST 2026
-----Original Message-----
From: Zhihao Cheng <chengzhihao1 at huawei.com>
Sent: Friday, February 27, 2026 10:35 AM
To: Azar Gantus <Azar.Gantus at mobileye.com>; linux-mtd at lists.infradead.org
Subject: Re: UBIFS bug: failure to read NNODE while initializing LPT on mount
EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
在 2026/2/27 3:33, Azar Gantus 写道:
> Hi,
>
> I have a problem when mounting a UBIFS volume on a MIPS I6400-based board.
> The Linux version we are running is 4.19.124.
> I ran several MTD tests and no issues were found among multiple boards using the same HW/SW.
>
> The issue is that during mounting, when initializing the LPT, during ubifs_lpt_lookup, we attempt to read an NNODE
> And it cannot be read.
[...]
>
> I have not managed to replicate this issue, however, I think it is related to an early garbage collection of an LPT LEB that still contains live nodes.
> One such flow is this:
> 1. In ubifs_lpt_start_commit, if (c->check_lpt_free) evalutes to true and we begin clearing space.
I think the dirty nnodes come from ubifs_lpt_post_commit->lpt_gc (which
is called from last do_commit()). Because the 'c->check_lpt_free' branch
is accessed only in the first do_commit(), which can ensure dirty pnodes
greater than 0.
And I'm agree on the analysis of the remaining processes.
> 2. We call on lpt_gc(…) and it selects the victim LEB, LEB X. LEB X contains LSAVEs, LTABs, NNODEs and PNODEs.
> In this LPT LEB X, only one NNODE is live, and the rest of the nodes are obsolete. We call lpt_gc_num(…, X).
> This live NNODE is connected to other live NNODEs or PNODEs found on another LPT LEB.
> 3. We run over every node in LPT LEB X, and since we mark the singular live NNODE as dirty. This does not increment c->dirty_pn_cnt.
> 4. LPT LEB X gets marked for trivial GC in ubifs_lpt_start_commit -> lpt_gc_start.
> 5. Since we did not increment c->dirty_pn_cnt, we hit the early return in ubifs_lpt_start_commit,
> and we don't proceed to do get_cnodes_to_commit, or layout_cnodes.
> 6. Post commit, we have a live NNODE on a LPT LEB X, despite it having been garbage collected.
> On the next start up, when initializing the LPT, LPT LEB X is unmapped, but the master still thinks there's a live NNODE at some offset there.
> Alternatively, it can be allocated again after garbage collection,
> and we might be pointing to incorrect data (the middle of some other, live data), or in the worst case, an actual live NNODE that is not the original.
>
> Is there anything that stops this flow from happening on rare occasions, or can this flow not even happen at all?
> I would appreciate your help regarding the issue, and the flow described above.
I have started a program and try to reproduce it, and I think it is hard
to trigger the problem, it is hard to make dirty pnode be 0 and dirty
nnode be non-zero:
diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index 5b3a840098b0..505d5fb409d2 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -105,6 +105,8 @@ static int nothing_to_commit(struct ubifs_info *c)
* locked. Returns zero in case of success and a negative error code
in case of
* failure.
*/
+#include <linux/delay.h>
+int g_wait;
static int do_commit(struct ubifs_info *c)
{
int err, new_ltail_lnum, old_ltail_lnum, i;
@@ -203,6 +205,12 @@ static int do_commit(struct ubifs_info *c)
if (err)
goto out;
+ if (g_wait) {
+ dump_stack();
+ pr_err("dump corrupted image\n");
+ msleep(86400 * 1000);
+ pr_err("wait done\n");
+ }
err = ubifs_log_post_commit(c, old_ltail_lnum);
if (err)
goto out;
diff --git a/fs/ubifs/lpt_commit.c b/fs/ubifs/lpt_commit.c
index f2cb214581fd..727f7e5e93a0 100644
--- a/fs/ubifs/lpt_commit.c
+++ b/fs/ubifs/lpt_commit.c
@@ -1178,6 +1178,7 @@ static int lpt_gc(struct ubifs_info *c)
* because they are not part of this commit. This function returns
zero in case
* of success and a negative error code in case of failure.
*/
+extern int g_wait;
int ubifs_lpt_start_commit(struct ubifs_info *c)
{
int err, cnt;
@@ -1212,6 +1213,10 @@ int ubifs_lpt_start_commit(struct ubifs_info *c)
lpt_tgc_start(c);
if (!c->dirty_pn_cnt) {
+ if (c->dirty_nn_cnt || c->lpt_drty_flgs) {
+ pr_err("get nn %u flag %u\n", c->dirty_nn_cnt, c->lpt_drty_flgs);
+ g_wait = 1;
+ }
dbg_cmt("no cnodes to commit");
err = 0;
goto out;
@@ -1315,7 +1320,7 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
err = lpt_tgc_end(c);
if (err)
goto out;
- if (c->big_lpt)
+ if (c->big_lpt) {
while (need_write_all(c)) {
mutex_unlock(&c->lp_mutex);
err = lpt_gc(c);
@@ -1323,6 +1328,11 @@ int ubifs_lpt_post_commit(struct ubifs_info *c)
return err;
mutex_lock(&c->lp_mutex);
}
+ pr_info("%d %d %d\n", c->dirty_nn_cnt, c->dirty_pn_cnt,
c->lpt_drty_flgs);
+ if ((c->dirty_nn_cnt || c->lpt_drty_flgs) && !c->dirty_pn_cnt) {
+ pr_err("only gc non-pnode\n");
+ }
+ }
out:
mutex_unlock(&c->lp_mutex);
return err;
[root at localhost ~]# cat test.sh
#!/bin/bash
pkill fsstress > /dev/null 2>&1
TMP=/root/temp
umount $TMP 2>/dev/null || true
mkdir -p $TMP
modprobe -r ubifs 2>/dev/null || true
for i in $(seq 0 1)
do
ubidetach -p /dev/mtd$i 2>/dev/null || true
done
modprobe -r ubi 2>/dev/null || true
modprobe -r nandsim 2>/dev/null || true
mtd=/dev/mtd0
ubi=/dev/ubi0
ID="0x20,0x78,0x00,0x00" # 128MB (16KB PEB, 512B page)
modprobe nandsim id_bytes=$ID
flash_eraseall /dev/mtd0
modprobe ubi mtd="0,512"
ubimkvol -N vol_a -m -n 0 /dev/ubi0
modprobe ubifs
mount -t ubifs /dev/ubi0_0 $TMP
while true
do
per=`df -Th | grep ubifs | awk '{print $6}'`;
if [[ ${per%?} -gt 95 ]]; then
rm -rf "$TMP/p$((RANDOM % 5))"
rm -rf "$TMP/p$((RANDOM % 5))"
fi
fsstress -d $TMP -l0 -p4 -n10000 &
sleep $((RANDOM % 5))
ps -e | grep -w fsstress > /dev/null 2>&1
while [ $? -eq 0 ]
do
pkill fsstress > /dev/null 2>&1
sleep 1
ps -e | grep -w fsstress > /dev/null 2>&1
done
sync &
sleep 1
sync &
msg=`dmesg | grep "dump corrupted image"`;
if [[ "$msg" != "" ]]
then
echo $msg
break
fi
done
dd if=$mtd of=disk bs=1M
ubidetach -m0
flash_eraseall $mtd
nandwrite $mtd disk > /dev/null
ubiattach -m0 -O512
>
> Thanks,
> Azar
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
Hi Zhao,
First of all thanks for the response.
I took your script and ran it for approximately 2 days and did not see any replication of the issue. I am uncertain that it is possible to replicate it with your methodology, as we want the a sync without any PNODE changes, which seems unlikely to happen after the fsstress threads run. Am I wrong?
More information about the linux-mtd
mailing list