Need help with NAND flash corruption problem using JFFS2

osterluk at yahoo.com osterluk at yahoo.com
Tue May 3 02:18:41 EDT 2011


Basically, I get file holes -- 4KB chunks go missing from files not actively 
being written.  The holes occur at various places in the files -- not a certain 
offset.  I need some ideas on how to troubleshoot the problem.


I fill a jffs2 partition with about 180 MiB of ‘ballast’ files and start a file 
writer process that consumes about 120 pages/second – to force eraseblocks to 
get cycled.  After about 1/2 hour or so, the total number of pages on the 
partition will have been written.  I run md5sum to check the files.   I get 
about one or two failures on an overnight run.  The bad files are exactly four 
KB shorter than when the test started.

Now power cycling or reboots are happening.  


I'm using a kernel originally developed/sponsored by Samsung for the s3c2413.  
The kernel is based on 2.6.26.6 with edits from Samsung and others.  I realize 
this processor is supported from upstream sources now, but I'm not in a position 

to move to it yet.  The kernel configuration is a lot like smdk2412_defconfig, 
but I have a x16 NAND.


I don't think I'm having SDRAM memory problems, things would be much worse.  I 
can run the nand tests from the snapshot mtd-utils-3c19d07 with no problems 
indicated.  I suspect some problem during garbage collection.

I tried turning up the verbosity on jffs2, but the target slows to a crawl -- I 
need to try to log to the network instead.  I tried to force clean blocks to get 

used more often, but I overshot that too and slowed the system to much -- and 
right about that time I found I could get a few errors overnight.


Here is a clip from /var/log/messages showing a case where a file was modified 
overnight:

Apr 18 04:04:06 nbox-3A56 -- MARK --
Apr 18 04:08:08 nbox-3A56 kernel: jffs2_flush_wbuf(): Write failed with -5
Apr 18 04:08:08 nbox-3A56 kernel: Write of 4164 bytes at 0x03955058 failed. 
returned -5, retlen 0
Apr 18 04:08:08 nbox-3A56 kernel: Not marking the space at 0x03955058 as dirty 
because the flash driver returned retlen zero
Apr 18 04:24:06 nbox-3A56 -- MARK --
Apr 18 04:44:06 nbox-3A56 -- MARK --
Apr 18 05:04:07 nbox-3A56 -- MARK --
Apr 18 05:24:07 nbox-3A56 -- MARK --
Apr 18 05:44:07 nbox-3A56 -- MARK --
Apr 18 06:04:07 nbox-3A56 -- MARK --
Apr 18 06:24:07 nbox-3A56 -- MARK --
Apr 18 06:44:08 nbox-3A56 -- MARK --
Apr 18 07:04:08 nbox-3A56 -- MARK --
Apr 18 07:24:08 nbox-3A56 -- MARK --
Apr 18 07:44:08 nbox-3A56 -- MARK --
Apr 18 08:04:08 nbox-3A56 -- MARK --
Apr 18 08:24:09 nbox-3A56 -- MARK --
Apr 18 08:44:09 nbox-3A56 -- MARK --
Apr 18 09:04:09 nbox-3A56 -- MARK --
Apr 18 09:24:09 nbox-3A56 -- MARK --
Apr 18 09:44:09 nbox-3A56 -- MARK --
Apr 18 10:04:10 nbox-3A56 -- MARK --
Apr 18 10:12:24 nbox-3A56 kernel: Node CRC failed on REF_PRISTINE data node at 
0x022e1058: Read 0x1038da5e, calculated 0x10b8da5e
Apr 18 10:12:24 nbox-3A56 kernel: Node CRC 1038da5e != calculated CRC 10b8da5e 
for node at 022e1058
Apr 18 10:12:24 nbox-3A56 kernel: Node CRC 1038da5e != calculated CRC 10b8da5e 
for node at 022e1058
Apr 18 10:24:10 nbox-3A56 -- MARK -

 Any help would be greatly appreciated.




More information about the linux-mtd mailing list