Help with UBI failure analysis

Sergei Poselenov sposelenov at emcraft.com
Mon Nov 23 02:37:06 EST 2020


Hello Richard,

Sorry for a long delay and thanks a lot for your reply. Could you please look at the crash dump below?

We set up a simple write-compare test on that device (please see the original message in the threa), on an empty UBI partition:
DIR=/mnt/data

rm -rf $DIR/*
touch $DIR/a1 $DIR/a2

while :; do
        for sz in 1 10 3 5; do
                cs1=`md5sum $DIR/a1 | cut -f1 -d' '`
                cs2=`md5sum $DIR/a2 | cut -f1 -d' '`
                if [ "$cs1" = "$cs2" ]; then
                        echo -n "." 
                else
                        echo "ERROR: $DIR/a1 and $DIR/a2 differ" 
                        exit 1
                fi
                mv $DIR/a1 $DIR/b1
                mv $DIR/a2 $DIR/b2
                dd if=/dev/urandom of=/tmp/rnd count=$sz bs=1M 2> /dev/null
                cp /tmp/rnd $DIR/a1
                cp /tmp/rnd $DIR/a2
                sync; echo 3 > /proc/sys/vm/drop_caches > /dev/null
        done
done


After about ~3days, the test failed with the following messages on the board console:
=================================
...........................................................................................................
...........................................................................................................
.......................................ubi1 warning: ubi_eba_copy_leb: read data back from PEB 895 and it is different
ubi1 error: wear_leveling_worker: error -22 while moving PEB 1 to PEB 895
ubi1 warning: ubi_ro_mode: switch to read-only mode
CPU: 0 PID: 181 Comm: ubi_bgt1d Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<8039fac8>] (wear_leveling_worker+0x808/0x858)
[<8039fac8>] (wear_leveling_worker) from [<8039e724>] (do_work+0x84/0x104)
[<8039e724>] (do_work) from [<803a02c8>] (ubi_thread+0xd4/0x170)
[<803a02c8>] (ubi_thread) from [<8003c9c0>] (kthread+0xdc/0xf4)
[<8003c9c0>] (kthread) from [<8000f7b8>] (ret_from_fork+0x14/0x3c)
ubi1 error: do_work: work failed with error code -22
ubi1 error: ubi_io_write: read-only mode
ubi1 error: ubi_write_fastmap: unable to write vid_hdr to fastmap SB!
ubi1 error: ubi_thread: ubi_bgt1d: work failed with error code -22
ubi1 warning: ubi_update_fastmap: Unable to write new fastmap, err=-30
ubi1 error: ubi_io_write: read-only mode
ubi1 warning: ubi_eba_write_leb: failed to write VID header to LEB 0:2476, PEB 885
UBIFS error (ubi1:0 pid 22714): ubifs_leb_map: mapping LEB 2476 failed, error -30
UBIFS warning (ubi1:0 pid 22714): ubifs_ro_mode: switched to read-only mode, error -30
CPU: 0 PID: 22714 Comm: cp Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<802057f8>] (ubifs_leb_map+0x118/0x120)
[<802057f8>] (ubifs_leb_map) from [<8020ef00>] (ubifs_add_bud_to_log+0x1d0/0x330)
[<8020ef00>] (ubifs_add_bud_to_log) from [<801f9b94>] (make_reservation+0x378/0x4ac)
[<801f9b94>] (make_reservation) from [<801fa5bc>] (ubifs_jnl_write_data+0xe4/0x264)
[<801fa5bc>] (ubifs_jnl_write_data) from [<801fc1ec>] (do_writepage+0x78/0x1cc)
[<801fc1ec>] (do_writepage) from [<800aafc0>] (__writepage+0x14/0x40)
[<800aafc0>] (__writepage) from [<800ab86c>] (write_cache_pages+0x170/0x3c0)
[<800ab86c>] (write_cache_pages) from [<800abafc>] (generic_writepages+0x40/0x60)
[<800abafc>] (generic_writepages) from [<800a2f68>] (__filemap_fdatawrite_range+0x78/0x9c)
[<800a2f68>] (__filemap_fdatawrite_range) from [<800a30a8>] (filemap_write_and_wait_range+0x34/0x74)
[<800a30a8>] (filemap_write_and_wait_range) from [<801fc934>] (ubifs_fsync+0x40/0xb4)
[<801fc934>] (ubifs_fsync) from [<800a3e70>] (generic_file_write_iter+0x1b0/0x264)
[<800a3e70>] (generic_file_write_iter) from [<801fde28>] (ubifs_write_iter+0xe8/0x178)
[<801fde28>] (ubifs_write_iter) from [<800e3820>] (__vfs_write+0xa8/0xd8)
[<800e3820>] (__vfs_write) from [<800e451c>] (vfs_write+0x90/0x164)
[<800e451c>] (vfs_write) from [<800e4f64>] (SyS_write+0x44/0x9c)
[<800e4f64>] (SyS_write) from [<8000f700>] (ret_fast_syscall+0x0/0x3c)
CPU: 0 PID: 22714 Comm: cp Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<802057e4>] (ubifs_leb_map+0x104/0x120)
[<802057e4>] (ubifs_leb_map) from [<8020ef00>] (ubifs_add_bud_to_log+0x1d0/0x330)
[<8020ef00>] (ubifs_add_bud_to_log) from [<801f9b94>] (make_reservation+0x378/0x4ac)
[<801f9b94>] (make_reservation) from [<801fa5bc>] (ubifs_jnl_write_data+0xe4/0x264)
[<801fa5bc>] (ubifs_jnl_write_data) from [<801fc1ec>] (do_writepage+0x78/0x1cc)
[<801fc1ec>] (do_writepage) from [<800aafc0>] (__writepage+0x14/0x40)
[<800aafc0>] (__writepage) from [<800ab86c>] (write_cache_pages+0x170/0x3c0)
[<800ab86c>] (write_cache_pages) from [<800abafc>] (generic_writepages+0x40/0x60)
[<800abafc>] (generic_writepages) from [<800a2f68>] (__filemap_fdatawrite_range+0x78/0x9c)
[<800a2f68>] (__filemap_fdatawrite_range) from [<800a30a8>] (filemap_write_and_wait_range+0x34/0x74)
[<800a30a8>] (filemap_write_and_wait_range) from [<801fc934>] (ubifs_fsync+0x40/0xb4)
[<801fc934>] (ubifs_fsync) from [<800a3e70>] (generic_file_write_iter+0x1b0/0x264)
[<800a3e70>] (generic_file_write_iter) from [<801fde28>] (ubifs_write_iter+0xe8/0x178)
[<801fde28>] (ubifs_write_iter) from [<800e3820>] (__vfs_write+0xa8/0xd8)
[<800e3820>] (__vfs_write) from [<800e451c>] (vfs_write+0x90/0x164)
[<800e451c>] (vfs_write) from [<800e4f64>] (SyS_write+0x44/0x9c)
[<800e4f64>] (SyS_write) from [<8000f700>] (ret_fast_syscall+0x0/0x3c)
UBIFS error (ubi1:0 pid 22714): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 22714): do_writepage: cannot write page 299 of inode 58400, error -30
cp: write error: Read-only file system
cp: can't create '/mnt/data/a2': Read-only file system
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 300 of inode 58400, error -30
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 301 of inode 58400, error -30
md5sum: can't open '/mnt/data/a2': No such file or directory
ERROR: /mnt/data/a1 and /mnt/data/a2 differ
~ #
~ # UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 303 of inode 58400, error -30
========================

Could you please share you thoughts on what could cause such errors? Does it confirm your initial suggestion that something happened on the MTD layer?

--
Regards,
Sergei Poselenov,
Emcraft Systems

On Thu, 24 Sep 2020 12:42:46 +0200 Richard Weinberger <richard.weinberger at gmail.com> wrote:

> On Thu, Sep 24, 2020 at 10:41 AM Sergei Poselenov
> <sposelenov at emcraft.com> wrote:
> > Freeing unused kernel memory: 280K (80898000 - 808de000)
> > UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node type (0 but expected 9)
> > UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node at LEB 12:100352, LEB mapping status 1
> > Not a node, first 24 bytes:
> > 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 02 00 00 08 00 00 00 10 00                          UBI#....................  
> 
> So, UBIFS reads from a node and finds a UBI EC header, this must not happen.
> UBI headers are invisible to UBIFS. If UBIFS reads from a LEB and a
> given offset, the offset is translated
> to read data after the page(s) where EC/VID headers reside.
> 
> *unless* LEB 12 offset 100352 contains an UBIFS data node for a
> nanddump you did before
> and the index tree is corrupted. But I don't think so.
> 
> Looks more like a problem on the mtd side.
> 




More information about the linux-mtd mailing list