UBIFS Corrupt during power failure

Eric Holmberg Eric_Holmberg at Trimble.com
Mon May 18 13:30:40 EDT 2009


Hi Stefan,

I am still seeing corruption even with the write buffer size limited to
8 bytes, but it's greatly limited.  Unfortunately our schedule doesn't
allow me to work on this full-time for the immediate future, so I'm
limited to small chunks of time for now.  Let me know if there is
anything I can do to assist/share since it looks like we both are in
need of fixing this.

At this point, I believe I have characterized the interrupted erase and
interrupted write patterns that are causing the problems, so the next
step I may take is to add the failure conditions into the NOR MTD device
simulator mtdram and see if I can get the same failures.

Let me know if you have any other ideas of approaches.

Here's the patch to change the maximum write buffer size to 8 bytes
(2^3).

Index: drivers/mtd/chips/cfi_probe.c
===================================================================
--- drivers/mtd/chips/cfi_probe.c	(revision 4477)
+++ drivers/mtd/chips/cfi_probe.c	(working copy)
@@ -18,7 +18,7 @@
 #include <linux/mtd/cfi.h>
 #include <linux/mtd/gen_probe.h>
 
-//#define DEBUG_CFI
+#define DEBUG_CFI
 
 #ifdef DEBUG_CFI
 static void print_cfi_ident(struct cfi_ident *);
@@ -251,6 +251,18 @@
 	cfi->cfiq->InterfaceDesc =
le16_to_cpu(cfi->cfiq->InterfaceDesc);
 	cfi->cfiq->MaxBufWriteSize =
le16_to_cpu(cfi->cfiq->MaxBufWriteSize);
 
+	//DEBUG - BEGIN - force max write size to 8 bytes (2^3)
+	if (cfi->cfiq->MaxBufWriteSize)
+	{
+		printk("Warning:  Overriding MaxBufWriteSize from 2^%d
to 2^%d\n",
+				cfi->cfiq->MaxBufWriteSize,
+				3
+				);
+		cfi->cfiq->MaxBufWriteSize = 3;
+	}
+	//DEBUG - END
+
+
 #ifdef DEBUG_CFI
 	/* Dump the information therein */
 	print_cfi_ident(cfi->cfiq);

Best Regards,

Eric Holmberg
Senior Firmware Engineer
Trimble Construction Services
Westminster, Colorado

> -----Original Message-----
> From: Stefan Roese [mailto:sr at denx.de] 
> Sent: Friday, May 15, 2009 1:17 AM
> To: Eric Holmberg
> Cc: linux-mtd at lists.infradead.org; dedekind at infradead.org; 
> Jamie Lokier; Urs Muff; Adrian Hunter
> Subject: Re: UBIFS Corrupt during power failure
> 
> Hi Eric,
> 
> On Saturday 18 April 2009 01:49:52 Eric Holmberg wrote:
> > > Yeah, let's wait for Eric's results and then will work on
> > > extending MTD device model with this parameter.
> >
> > As suggested, I patched my 2.6.27 kernel with the latest from
> > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git 
> (includes all
> > updates up to and including fhe fix-recovery bug,
> > 
> http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git?a=co
> mmit;h=e14
> > 4c1c037f1c6f7c687de5a2cd375cb40dfe71e).
> >
> > I have the unit running with a maximum write buffer of 8 
> bytes (the NOR
> > flash chip is capable of 64 bytes).
> 
> How exactly did you do this? In cfi_cmdset_0002.c?
> 
> > I was seeing 4 different failure scenarios with the base 
> 2.6.27 code,
> > but now I am only seeing one remaining failure after 30+ 
> hours of power
> > cycling.  I added a stack dump this afternoon that will let 
> me pinpoint
> > exactly what is happening, but haven't seen the failure, yet.
> >
> > The failure happens when I get two corrupt empty LEB's.  I 
> believe the
> > scenario is that an erase is interrupted and on the next 
> boot, while the
> > file system is being recovered, another power failure occurs.
> >
> > I can erase one of the LEB's manually in U-Boot and the file system
> > recovers properly.
> >
> > I'm going to leave the units running over the weekend and 
> see what is
> > waiting for me Monday morning.
> 
> Do you have an update for this? What's the current status on 
> your system now? 
> Which patches did you apply to work reliably with the Spansion FLASH?
> 
> I'm asking since we are seeing a similar issue on one of our 
> boards equipped 
> with the S29GL512P. This simple script triggers problems upon 
> the next mount:
> 
> ---
> mount -t ubifs ubi0:testvolume /mnt
> sync
> reboot -n -f
> ---
> 
> The next mount will result most of the time in this:
> 
> UBIFS: recovery needed
> UBIFS error (pid 406): ubifs_scan: corrupt empty space at LEB 3:130320
> UBIFS error (pid 406): ubifs_scanned_corruption: corrupted 
> data at LEB 
> 3:130320
> UBIFS error (pid 406): ubifs_scan: LEB 3 scanning failed
> UBIFS error (pid 406): ubifs_recover_leb: corrupt empty space 
> at LEB 3:32
> UBIFS error (pid 406): ubifs_scanned_corruption: corrupted 
> data at LEB 3:32
> UBIFS error (pid 406): ubifs_recover_leb: LEB 3 scanning failed
> mount: Structure needs cleaning
> 
> This is without the patch from this thread included (in 
> recovery.c). With this 
> patch included the recovery is successful all the time, as 
> far as we can see 
> right now. But I'm wondering if we really need to disable the 
> write buffer in 
> the CFI driver or reduce the write buffer to 8.
> 
> Thanks.
> 
> Best regards,
> Stefan
> 
> =====================================================================
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office at denx.de
> =====================================================================
> 



More information about the linux-mtd mailing list