[PATCH] JFFS2 appears to "freeze" during erase, version 2

Tue Jun 12 10:07:05 EDT 2007

On Mon, 2007-06-11 at 16:15 +0200, Joakim Tjernlund wrote:
> On Mon, 2007-06-11 at 10:34 +0200, Joakim Tjernlund wrote:
> > On Tue, 2007-06-05 at 09:51 +0200, Joakim Tjernlund wrote:
> > > Here is an updated version that contains all everything in one patch. I hope
> > > this one can be applied as is.
> > > 
> > >  Jocke
> > > 
> > > >From 87cd93db5895e1506a6abb0dbc891587e96d8547 Mon Sep 17 00:00:00 2001
> > > From: Joakim Tjernlund <Joakim.Tjernlund at transmode.se>
> > > Date: Tue, 5 Jun 2007 09:38:53 +0200
> > > Subject: [PATCH] JFFS2 appears to "freeze" during erase
> > > Radoslaw Bisewski <radbis at googlemail.com> writes:
> > > With current desing erase_free_sem is locked every time the flash
> > > block is being erased. For NOR flashes - ~1 second is needed to erase
> > > single flash block. In the worst case scenario erase_free_sem may be
> > > locked for a couple of seconds when the number of blocks is being
> > > erased (e.g. after large file was removed). When erase_free_sem is
> > > locked all read/write operations for given JFFS2 partition are locked
> > > too - in effect from time to time access to the JFFS2 partition is
> > > locked for a number of seconds. This fix makes critical section in
> > > flash erasing procedure shorter - now erase_free_sem is locked around
> > > erase_completion_lock spinlock only.
> > > 
> > > Additional bug fixes by me.
> > > 
> > > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund at transmode.se>
> > > ---
> > [SNIP patch]
> > 
> > After running our boards a few days with the above patch we have seen
> > a new hang during SW upgrade of the board.
> > 
> > A simple script that is run to do some validation of the new SW
> > sometimes takes about 2 minutes to complete instead of 1 second.
> > 
> > FS I/O  part of that script is:
> >   BOARD=`fw_printenv genBoardType | sed 's/^.*=//' | sed 's/^cu.*/cu/'`
> >   TMP_FILE=/tmp/.swu_operations.log
> >   echo "Some text 1" >> $TMP_FILE
> >   echo "Some text 2" >> $TMP_FILE
> >   echo `date` >> $TMP_FILE
> >   echo "Some text 3" >> $TMP_FILE
> > 
> > fw_printenv is an utility that accesses(reads) the u-boots environment
> > via /dev/mtd.
> > /tmp is mounted as a tmpfs:
> > none            /tmp            tmpfs   defaults                        0 0
> > 
> > This is on 2.6.20 with cfi_cmdset_0001 driver
> > 
> > Any ideas what is going on is most welcome.
> > 
> >  Jocke
> 
> looking into the /dev/mtd driver, mtdchar.c I wonder about something in
> mtd_close:
> 	if (mtd->sync)
> 		mtd->sync(mtd);
> 
> Should the mtd->sync call be here? I think this might be the cause
> for the long stalls we see from time to time. If an long erase is on
> going, this calls won't return until all sectors are erased, me thinks.
> 
>  Jocke

Managed to get a strace -T when running fw_printenv and there I could
see that the stalls came when closing a /dev/mtd device:
open("/dev/mtd1", O_RDONLY)             = 3 <0.000143>
lseek(3, 0, SEEK_SET)                   = 0 <0.000063>
read(3, "\344\tU\214\0", 5)             = 5 <0.000086>
read(3, "bootcmd=setenv bootargs root=/de"..., 8187) = 8187 <0.002510>
close(3)                                = 0 <115.221334>
open("/dev/mtd2", O_RDONLY)             = 3 <0.000124>
lseek(3, 0, SEEK_SET)                   = 0 <0.000057>
read(3, "X\226\255\325\1", 5)           = 5 <0.000078>
read(3, "bootcmd=setenv bootargs root=/de"..., 8187) = 8187 <0.002502>
close(3)                                = 0 <34.448065>

115 resp. 34 seconds to close a readonly device!

Whats going on?