UBIFS Corruption

Reginald Perrin reggyperrin at yahoo.com
Fri Jul 8 09:12:45 EDT 2011


Hi folks,

We're using ubifs in an embedded uclinux system (based on ADI Blackfin's BF524). 
 Been working great for us for a while.  Kernel is 2.6.34.7 (uclinux).

However, we just saw 2 corruptions within the past 48h that we can't explain. 
 We've been doing the same basic operation (in terms of flashing/reading/writing 
images) for quite some time, and have reflashed our units many times (over 
thousands of different hardware units).

Device #1 failure:
* Device was running out of a partition mounted to /home (a 93MB partition from 
a 128MB NAND device)
* Our app was running normally and locked up (not sure why).  Our code may have 
been updating a sqlite database located in that partition
* When we power cycled, the partition had the corruption issue noted.

Device #1 boot log:
UBI device number 1, total 750 LEBs (96768000 bytes, 92.3 MiB), available 0 LEBs 
(0 bytes), LEB size 129024 bytes (126.0 KiB) 
[ 5.228000] UBIFS: recovery needed 
[ 5.320000] UBIFS error (pid 363): ubifs_scanned_corruption: corruption at LEB 
172:45056 
[ 5.348000] UBIFS error (pid 363): ubifs_recover_leb: LEB 172 scanning failed 
mount: mounting ubi1:home on /home failed: Structure needs cleaning 

Device #2 failure:
* Device was running normally.
* We upgraded our application (which involved updating executables on that 
partition)
* After the successful upgrade, we powered the unit down and stored 
* Days later, powered up the device and the above invalid CRC as noted

Device #2 boot log:
UBI device number 1, total 750 LEBs (96768000 bytes, 92.3 MiB), available 0 LEBs 
(0 bytes), LEB size 129024 bytes (126.0 KiB) 
[ 5.488000] UBIFS: recovery needed 
[ 5.492000] UBIFS error (pid 365): check_lpt_crc: invalid crc in LPT node: crc 
a0 calc 9013
mount: mounting ubi1:home on /home failed: Invalid argument


So, what is concerning is the sheer randomness of these failures.  In neither 
case were we doing anything new (vs. standard operations we have been performing 
for over a year on many devices per day).  Additionally, there's no additional 
logging available, because this *never* happens.  We have never needed (after we 
got UBIFS working) to have the debug output enabled in the driver.  To make 
matters worse, if you ask me to reproduce this, I don't know any way of doing 
it.  We have automated tests that run continually, and they never see these 
issues.

One corruption could be written off as a fluke, but 2 happening within 48h is 
very unusual.  

Can anybody give me any insight into this?  

TIA
RP



More information about the linux-mtd mailing list