UBI/UBIFS: md5sum of files go bad while running (no powercut)
Ralph Erdmann
rerdmann at bittailors.com
Thu Dec 17 02:03:04 PST 2015
Dear all,
we encounter the following problem on some of our embedded boards in the field:
Some files "go bad" during runtime, which causes our application to crash.
A reboot "heals" this error.
Some background:
- Kernel 3.12.1
- Freescale i.Mx28 Cpu
- Micron MT29F1G08ABADAH4IT:D 128MiB NAND SLC Flash
- Filesystem is mounted read only
- ~100 boards in the field
- Error appears randomly, usually after some days of uptime (i have seen something between 2 and 7 days)
- I can't clearly say if errors appear on all or only some devices
I have created a list of md5sums of all files in the filesystem, that do not change (so no config files, symlinks, temp data, dev, and only 'find -type f' files). After our application crashed i checked the md5sums and found out, that some of our librarys changed (md5 mismatch).
- i observed this 3 or 4 times
- it were always our librarys, but not always the same files (one time only one file was bad, next time it were two, next time again one)
- the librarys are the largest files in the filesystem (~1mb to 5mb each)
- filesystem was still mounted read only at this point
- i was able to backup the damaged files
- there was no power cut
- no output / messages in dmesg
- after a reboot the files were fine again
- i remounted the filesystem rw, md5sums still bad, called linux 'sync', again after a reboot the files were fine again
- concerning 'ubinfo -a' there are no bad blocks in flash (see full output at the end of this message)
- i have not found a reliable method to reproduce the problem
I screened the subjects of the mailing list of the last six month and i read a lot about 'The unstable bits issue'. I think our problem is different, because it happens after a long runtime and not after a powercut.
Has anyone observed something like that?
I would be very happy about ideas what causes this behaviour .
I am a bit stranded now.
Thank you all for your effort and help!
Kind regards
Ralph
PS: some logs:
The 'ubinfo -a' log:
UBI version: 1
Count of UBI devices: 1
UBI control device major/minor: 10:59
Present UBI devices: ubi0
ubi0
Volumes count: 4
Logical eraseblock size: 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks: 880 (111738880 bytes, 106.6 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes 128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 20
Current maximum erase counter value: 14
Minimum input/output unit size: 2048 bytes
Character device major/minor: 247:0
Present volumes: 0, 1, 2, 4
Volume ID: 0 (on ubi0)
Type: dynamic
Alignment: 1
Size: 462 LEBs (58662912 bytes, 55.9 MiB)
State: OK
Name: filesystem1
Character device major/minor: 247:1
-----------------------------------
Volume ID: 1 (on ubi0)
Type: dynamic
Alignment: 1
Size: 327 LEBs (41521152 bytes, 39.6 MiB)
State: OK
Name: filesystem2
Character device major/minor: 247:2
-----------------------------------
Volume ID: 2 (on ubi0)
Type: dynamic
Alignment: 1
Size: 17 LEBs (2158592 bytes, 2.1 MiB)
State: OK
Name: certs
Character device major/minor: 247:3
-----------------------------------
-----------------------------------
Volume ID: 4 (on ubi0)
Type: dynamic
Alignment: 1
Size: 50 LEBs (6348800 bytes, 6.1 MiB)
State: OK
Name: cfg2
Character device major/minor: 247:5
Some kernel output:
[ 1.659580] ONFI param page 0 valid
[ 1.663260] ONFI flash detected
[ 1.666468] NAND device: Manufacturer ID: 0x2c, Chip ID: 0xf1 (Micron MT29F1G08ABADAH4), 128MiB, page size: 2048, OOB size: 64
[ 1.678182] Scanning device for bad blocks
[ 2.140204] 9 cmdlinepart partitions found on MTD device gpmi-nand
[ 2.146439] Creating 9 MTD partitions on "gpmi-nand":
[ 2.151681] 0x000000000000-0x000000300000 : "uboot"
[ 2.165122] 0x000000300000-0x000000380000 : "ubootenv"
[ 2.175378] 0x000000380000-0x000000400000 : "ubootenv_redundant"
[ 2.186065] 0x000000400000-0x000000a00000 : "kernel1"
[ 2.196459] 0x000000a00000-0x000000a80000 : "fdt1"
[ 2.206002] 0x000000a80000-0x000001080000 : "kernel2"
[ 2.215806] 0x000001080000-0x000001100000 : "fdt2"
[ 2.225309] 0x000001100000-0x000007f00000 : "filesystem"
[ 2.235492] 0x000007f00000-0x000008000000 : "reserved"
[ 2.245441] gpmi-nand 8000c000.gpmi-nand: driver registered.
[ 2.257569] pinctrl-mxs: name=<mac0> fsl,voltage=1 (0=1V8, 1=3V3)
[ 2.265962] of_get_named_gpio_flags exited with status 141
More information about the linux-mtd
mailing list