UBIFS: problem report: about lpt LEB scanning failed (no issue)

Tue Jun 4 20:25:05 PDT 2024

Problem description

Recently I was testing UBIFS with fsstress on a nor flash(simulated by 
mtdram, 64M size,16K PEB, which means big lpt mode for UBIFS), the 
utilization rate of one CPU(fsstress program) is 100%, and the fsstress 
program cannot be killed. The fsstress program stucks in a dead loop:

do_commit -> ubifs_lpt_start_commit:

   while (need_write_all(c)) {

     mutex_unlock(&c->lp_mutex);

     err = lpt_gc(c);

     if (err)

       return err;

     mutex_lock(&c->lp_mutex);

   }

Then I found that lpt_gc_lnum handles the same LEB(lnum 8) every time, 
and the c->ltab[i].dirty for LEB 8 is not equal to c->leb_size after 
invoking lpt_gc_lnum(). After analyzing the lpt nodes on LEB 8, 
lpt_gc_lnum returns early before scanning all lpt nodes. The lpt LEB 8 
is shown as(partial):

[  104.740309] LEB 8:14383 len 13, nnode num 31,
[  104.740689] dirty 1
[  104.740905] LEB 8:14396 len 13, nnode num 7,
[  104.741277] dirty 1
[  104.741486] LEB 8:14409 len 13, nnode num 1,
[  104.741870] dirty 1
[  104.742078] LEB 8:14422 len 16, pnode num 745
[  104.742475] dirty 1
[  104.742682] B type 8 0
[  104.742925] LEB 8:14438, pad 2 bytes min_io_size 8
[  104.743301] LEB 8:14440, free 1368 bytes  // Actually, the left 1368 
bytes are not 0xff, the scanning function(dump_lpt_leb) parses lpt nodes 
in a wrong way
[  104.743674] (pid 1095) finish dumping LEB 8

The binary image for LEB 8 is(partial):

0x3840 = 14400

  00003840: 6a e4 60 cf 91 b1 f3 82 03 17 59 11 40 ac b9 fc 99 11 83 c3 
83 03  ff  ff   90 6e c3 ec 04 f3 26 a1  j.`.......Y. at .......... 
..n....&.
  00003860: bf 09 41 a2 6f 94 15 09 58 ee 5f ce 97 7e 09 b8 86 a0 d8 2c 
62 3b 47 37 62 e5 e8 59 86 be 82 fe  ..A.o...X._..~.....,b;G      7b..Y....
  00003880: 17 6d 63 95 ce 80 76 6e ad e6 44 af f6 43 06 ab 41 28 04 99 
72 1f 31 91 cb 96 b1 ef 43 6e 22 2c  .mc...vn..D..C..A(..r.1      .....Cn",
  000038a0: 26 57 d0 9c b5 76 8b 08 1d fc 41 07 8c ba 26 3b 45 e1 7b 23 
de d5 19 63 f3 6c e8 95 b7 02 5a 89  &W...v....A...&;E.{#...      c.l....Z.
  000038c0: 83 81 0e 72 7c 4b 59 a3 c4 c0 e1 e5 22 7c 27 8d 85 ad c2 93 
25 ac 5b 32 c8 02 07 2f 24 f9 e0 f6  ...r|KY....."|'.....%.[      2.../$...
  000038e0: e3 87 f2 bb 62 23 d5 e4 2e b7 8c 41 61 43 2a a4 2f ce 92 4f 
62 47 88 a2 11 a6 51 1f da 51 e7 a4  ....b#.....AaC*./..ObG.      ...Q..Q..

Let's parse above data by lpt_gc_lnum().

The nnode(1) is at 8: 14409~14421, corresponding data is '17 59 11 40 ac 
b9 fc 99 11 83 c3 83 03',  the type field is the lower 
UBIFS_LPT_TYPE_BITS(4) bits in '0x11' according to ubifs_pack_nnode(), 
and the data looks good and it can be parsed as a nnode. The next 2 
bytes(8: 14422~14423) are 0xff, which means that lpt data is written 
into flash with an alignment of 8 bytes(See write_cnodes). After 
modifying the code of lpt_gc_lnum(), let UBIFS skip the 2 bytes(0xff), 
UBIFS could parse all lpt nodes in LEB 8. But in fact, UBIFS parses 
these 2 bytes(0xff) as the crc field of pnode(8: 14422~14437), and the 
crc16 result of the pnode is just 0xffff, so the field(8: 14422~14437) 
is parsed as a pnode, and the left lpt nnodes cannot be parsed because 
of the wrong parsing offset.

Why it can happen?

The root cause is that the implementation of lpt area disk layout is 
simple, it would be better if UBIFS has a length field in LPT node. 
Otherwise, it could be possible that the crc16 result is right both for 
offset_A~offset_B(node X) and  offset_A+2~ offset_C(node Y).

Will it happen on a nand flash?

In theory, I would say 'yes'. But I never meet it after testing for a 
whole day. I guess that the min_io_size for nand is (at least) 512, the 
length of pending bytes(0xff) is hardly less than 3 bytes, so it is hard 
to reproduce that the crc16 result is right both for 
offset_A~offset_B(node X) and  offset_A+2~ offset_C(node Y).

How to reproduce it?

You can generate a problem image by a script test.sh (When you see hung 
task warning or the utilization rate of one CPU becomes 100%, it means 
the problem occurs).

#!/bin/sh

DEV=/dev/ubi0_0
KEY_FILE=/tmp/key
MNT=/root/temp
mtdram_patt="mtdram test device"

function fatal()
{
     echo "Error: $1" 1>&2
     exit 1
}

function find_mtd_device()
{
     printf "%s" "$(grep "$1" /proc/mtd | sed -e 
"s/^mtd\([0-9]\+\):.*$/\1/")"
}

# Load mtdram with specified size and PEB size
# Usage: load_mtdram <flash size> <PEB size>
# 1. Flash size is specified in MiB
# 2. PEB size is specified in KiB
function load_mtdram()
{
     local size="$1";     shift
     local peb_size="$1"; shift

     size="$(($size * 1024))"
     modprobe mtdram total_size="$size" erase_size="$peb_size"
}

function run_test()
{
     local size="$1";
     local peb_size="$2";
     local page_size="$3";

     echo 
"======================================================================"
     printf "%s" "MTDRAM ${size}MiB PEB size ${peb_size}KiB"
     echo ""

     load_mtdram "$size" "$peb_size" || echo "cannot load mtdram"
     mtdnum="$(find_mtd_device "$mtdram_patt")"

     flash_eraseall /dev/mtd$mtdnum
     modprobe ubi mtd="$mtdnum,$page_size" || fatal "modprobe ubi fail"
     ubimkvol -N vol_test -m -n 0 /dev/ubi0 || fatal "mkvol fail"
     modprobe ubifs || fatal "modprobe ubifs fail"
     mount -t ubifs $DEV $MNT || fatal "mount ubifs fail"

     fsstress -d $MNT -l0 -p4 -n10000 &
     sleep $((RANDOM % 120))

     ps -e | grep -w fsstress > /dev/null 2>&1
     while [ $? -eq 0 ]
     do
         killall -9 fsstress > /dev/null 2>&1
         sleep 1
         ps -e | grep -w fsstress > /dev/null 2>&1
     done

     while true
     do
         res=`mount | grep "$MNT"`
         if [[ "$res" == "" ]]
         then
             break;
         fi
         umount $MNT
         sleep 0.1
     done

     modprobe -r ubifs
     modprobe -r ubi
     modprobe -r mtdram

     echo 
"----------------------------------------------------------------------"
}

while true
do
     run_test "64" "16" "512"
done

https://bugzilla.kernel.org/show_bug.cgi?id=218935

Or you can mount the problem image(disk.tar.gz) directly by following 
script:
#!/bin/sh
DEV=/dev/ubi0_0
KEY_FILE=/tmp/key
MNT=/root/temp
mtdram_patt="mtdram test device"

function fatal()
{
	echo "Error: $1" 1>&2
	exit 1
}

function find_mtd_device()
{
	printf "%s" "$(grep "$1" /proc/mtd | sed -e "s/^mtd\([0-9]\+\):.*$/\1/")"
}

# Load mtdram with specified size and PEB size
# Usage: load_mtdram <flash size> <PEB size>
# 1. Flash size is specified in MiB
# 2. PEB size is specified in KiB
function load_mtdram()
{
	local size="$1";     shift
	local peb_size="$1"; shift

	size="$(($size * 1024))"
	modprobe mtdram total_size="$size" erase_size="$peb_size"
}

function run_test()
{
	local size="$1";
	local peb_size="$2";
	local page_size="$3";

	echo 
"======================================================================"
	printf "%s" "MTDRAM ${size}MiB PEB size ${peb_size}KiB"
	echo ""

	load_mtdram "$size" "$peb_size" || echo "cannot load mtdram"
	mtdnum="$(find_mtd_device "$mtdram_patt")"

	flash_eraseall /dev/mtd$mtdnum
	tar xvzf disk.tar.gz
	dd if=disk of=/dev/mtd0 bs=1M
	modprobe ubi mtd=0,512
	mount /dev/ubi0_0 /root/temp
}

run_test "64" "16" "512"

PS: I report the problem as no issue, because I don't think we can fix 
it without modifying disk layout. I think it's just a designment nit, no 
need to fix it. I just want people know the problem if someone meet it 
one day.