UBIFS failure & stable page writes

Mon May 27 09:13:49 EDT 2013

Hi Artem and/or Adrian,

As you see we ran into some issue looking at UBIFS.
(We use a custom build Linux-2.6.37.6 kernel/distro)

Here you see a part of our log-output
UBIFS DBG (pid 767): ubifs_bg_thread: background thread "ubifs_bgt1_1" started, PID 767
UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 1, volume 1, name "datafs"
UBIFS: file system size:   640475136 bytes (625464 KiB, 610 MiB, 1241 LEBs)
UBIFS: journal size:       10452992 bytes (10208 KiB, 9 MiB, 21 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root:  0 bytes (0 KiB)
UBIFS DBG (pid 1): mount_ubifs: compiled on:         May 27 2013 at 13:08:24
UBIFS DBG (pid 1): mount_ubifs: min. I/O unit size:  4096 bytes
UBIFS DBG (pid 1): mount_ubifs: max. write size:     4096 bytes
UBIFS DBG (pid 1): mount_ubifs: LEB size:            516096 bytes (504 KiB)
UBIFS DBG (pid 1): mount_ubifs: data journal heads:  1
UBIFS DBG (pid 1): mount_ubifs: UUID:                1D1AD726-53AB-4623-8D86-B7773B5EFA83
UBIFS DBG (pid 1): mount_ubifs: big_lpt              0
UBIFS DBG (pid 1): mount_ubifs: log LEBs:            4 (3 - 6)
UBIFS DBG (pid 1): mount_ubifs: LPT area LEBs:       2 (7 - 8)
UBIFS DBG (pid 1): mount_ubifs: orphan area LEBs:    1 (9 - 9)
UBIFS DBG (pid 1): mount_ubifs: main area LEBs:      1241 (10 - 1250)
UBIFS DBG (pid 1): mount_ubifs: index LEBs:          31
UBIFS DBG (pid 1): mount_ubifs: total index bytes:   702944 (686 KiB, 0 MiB)
UBIFS DBG (pid 1): mount_ubifs: key hash type:       0
UBIFS DBG (pid 1): mount_ubifs: tree fanout:         8
UBIFS DBG (pid 1): mount_ubifs: reserved GC LEB:     166
UBIFS DBG (pid 1): mount_ubifs: first main LEB:      10
UBIFS DBG (pid 1): mount_ubifs: max. znode size      240
UBIFS DBG (pid 1): mount_ubifs: max. index node size 192
UBIFS DBG (pid 1): mount_ubifs: node sizes:          data 48, inode 160, dentry 56
UBIFS DBG (pid 1): mount_ubifs: node sizes:          trun 56, sb 4096, master 512
UBIFS DBG (pid 1): mount_ubifs: node sizes:          ref 64, cmt. start 32, orph 32
UBIFS DBG (pid 1): mount_ubifs: max. node sizes:     data 4144, inode 4256 dentry 312, idx 188
UBIFS DBG (pid 1): mount_ubifs: dead watermark:      4096
UBIFS DBG (pid 1): mount_ubifs: dark watermark:      8192
UBIFS DBG (pid 1): mount_ubifs: LEB overhead:        2240
UBIFS DBG (pid 1): mount_ubifs: max. dark space:     10166272 (9928 KiB, 9 MiB)
UBIFS DBG (pid 1): mount_ubifs: maximum bud bytes:   8388608 (8192 KiB, 8 MiB)
UBIFS DBG (pid 1): mount_ubifs: BG commit bud bytes: 6815744 (6656 KiB, 6 MiB)
UBIFS DBG (pid 1): mount_ubifs: current bud bytes    405504 (396 KiB, 0 MiB)
UBIFS DBG (pid 1): mount_ubifs: max. seq. number:    8295846
UBIFS DBG (pid 1): mount_ubifs: commit number:       1434
UBIFS error (pid 1): ubifs_iget: failed to read inode 1, error -2
UBIFS DBG (pid 767): ubifs_bg_thread: background thread "ubifs_bgt1_1" stops

Looking into the code I see the failed to read inode 1, error -2 is cause by:
"ubifs_lookup_level0 -> -ENOENT" located in tnc.c

Looking inside the ubifs_lookup_level0 function it flows out of the function through:
        if (exact || !is_hash_key(c, key) || *n != -1) {
                dbg_tnc("found %d, lvl %d, n %d", exact, znode->level, *n);
                return exact;
        }

Where: 
	exact = 0
	*n = -1
	So: "!is_hash_key(c, key)" must be true.

'exact' is in the calling functioned interpret as found (where !found generates the error); this seems a littlebit curious to me.
But on first sight its complex code and maybe you can give some glue?

I suppose a piece of 'tree node cache' has been build-up during recovery... could the problem be there?

Thanks in advance,

Anton

-----Original Message-----
From: Jan Kara [mailto:jack at suse.cz] 
Sent: maandag 27 mei 2013 14:18
To: Prins Anton (ST-CO/ENG1.1)
Cc: jack at suse.cz; Artem Bityutskiy; Adrian Hunter; linux-mtd at lists.infradead.org
Subject: Re: UBIFS failure & stable page writes

  Dear Prins,

On Mon 27-05-13 13:17:42, Prins Anton (ST-CO/ENG1.1) wrote:
> We are doing experiments with file systems on NAND flashes and experience
> some problems with UBIFS.  If required; it recovers correct, but on a
> certain moment (at boot) it seems it cannot find its root node anymore
> (recovery succeeded)
> 
> While reading the logs on the kernel.org I came across your patch:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs?id=182dcfd648aef0705fd04c42cbe507c09e00c25d
> 
> Do you have some information what you experience in case of failure? Is
> it possible to lose (consequent) a root-node because unstable page
> writes?
  No, stable page write work shouldn't have any impact on this. NAND flash
isn't the kind of storage that requires stable pages. I think it's the best
if you ask ubifs maintainers directly - I've CCed them. BTW, you didn't
mention which kernel version you are using.

								Honza
-- 
Jan Kara <jack at suse.cz>
SUSE Labs, CR