a UBIFS image makes task pdflush blocked > 120 seconds

Norbert van Bolhuis nvbolhuis at aimvalley.nl
Wed Nov 11 10:54:35 EST 2009


>> ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect().
>> I'll report back.
>>
>> Btw. stackdump is the same (2 out of 2 times).
> 
> OK. I really do not have any idea off the top of my head now, sorry.
> Try to investigate this a beet deeper.
> 


I investigated this. It turn out it's some kind of race condition between the pdflush
deamon and other tasks (e.g. events). It has everything to do the the commit_sem
and io_mutex. It's not related to my preprogrammed UBIFS image (i've seen it also after
a big file was written to UBIFS and a sync was done "simultaneously") nor is it related
to ubifs_garbage_collect().

Here's what I did and how I conclude the above.

I logged (in memory) all commit_sem, io_mutex and ubifs_garbage_collect actions
and parse the results to human readable text afterwards.
"Req" means mutex/semaphore requested. If there's no "Req" the operation
has been done.


The race condition occurs for the below UBIFS pseaudo code (which is done by many
UBIFS functions):


  err = make_reservation () {
   down_read(&c->commit_sem)				/*** commit_sem, make_reservation, DOWN 1 ***/
   err = reserve_space(c, jhead, len) {
    ...
  again:
    mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);	/*** io_mutex, reserve_space, LOCK 1 ***/
    ...
    avail = c->leb_size - wbuf->offs - wbuf->used;
    if (wbuf->lnum != -1 && avail >= len)
     return 0;
   }
   if (!err)
  	return 0;
  }
  /* the journal head stays locked and later has to be unlocked using release_head() */
  if (err)
   goto out_free;
  ...
  err = write_head(c, BASEHD, dent, len, &lnum, &dent_offs, sync);
  if (err)
   goto out_release;
  ...
  release_head(c, BASEHD) {
   mutex_unlock(&c->jheads[jhead].wbuf.io_mutex);	/*** io_mutex, release_head, UNLOCK ***/
  }
  ...
  finish_reservation(c) {
   up_read(&c->commit_sem);				/*** commit_sem, finish_reservation, UP ***/
  }




Normally (for one "update" I guess) the logs shows:

TIME=0000.000, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0000.000, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0000.000, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0000.000, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0000.480, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0000.480, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0000.480, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0000.480, M(t=0,xy=88)=commit_sem, finish_reservation, UP



when "another task" is also updating the log shows:

TIME=0304.252, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0304.252, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0304.252, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0304.252, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0304.292, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0304.292, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?
TIME=0304.292, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP
TIME=0304.736, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP

I really don't understand the commit_sem here. Is there a 2nd copy of commit_sem ?



the very first time pdflush interfers the pdflush error report shows
(for which there's also a log entry):

TIME=0021.644, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0021.644, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1
TIME=0021.644, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1
TIME=0021.644, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1
TIME=0034.316, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1
TIME=0034.316, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1	<- 2nd DOWN !?
TIME=0034.316, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1		<- pdflush blocks here
TIME=0168.868, M(t=3,xy=00)=pdflush error report
TIME=0302.012, M(t=3,xy=00)=pdflush error report
TIME=0303.280, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0303.280, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK		<- pdflush can continue
TIME=0303.280, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0303.280, M(t=0,xy=88)=commit_sem, finish_reservation, UP
TIME=0303.284, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1		<- pdflush continues here
TIME=0303.284, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK
TIME=0303.284, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK
TIME=0303.284, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP
TIME=0303.284, M(t=0,xy=88)=commit_sem, finish_reservation, UP



I must admit I dont't see the root cause. I don't understand why the commit_sem
doesn't seem to work. Nor do I understand why the "other task" needs to hold the
io_mutex for more than 280 seconds.

If you see the reason or have any idea please let me know, otherwise I'll
investigate this more thorough.

thanks,
Norbert.




More information about the linux-mtd mailing list