jffs2 deadlock on alloc_sem in jffs2_reserve_space

Tue Jul 27 22:16:38 EDT 2004

We're having very intermittent problems with lockups while deleting
files under JFFS2.  We're running the 20040603 CVS snapshot applied to
the 2.4.26-vrs1 arm-linux tree.  It's hard to reproduce, so we've not
been able to collect much debugging information.  In particular, it never
happens when we have debugging traces on.  We did manage to capture
one case where it locked up and we were able to get process stack
traces of all of the processes in the system.  From those traces it
appears that only the process doing the unlink was in JFFS2 at the
time, so it doesn't appear to be a simple deadlock.  The process that
was doing the unlink was stuck doing a down on alloc_sem in
jffs2_reserve_space.  Since no one else appeared to be holding the
semaphore (although perhaps it could be held across calls), it seemed
possible that perhaps the semaphore wasn't being released by some
previous caller, possibly on some error path.  The only obvious case
was at the end of jffs2_garbage_collect_pass:

	f = jffs2_gc_fetch_inode(c, inum, nlink);
	if (IS_ERR(f))
		return PTR_ERR(f);
	if (!f)
		return 0;

	ret = jffs2_garbage_collect_live(c, jeb, raw, f);

	jffs2_gc_release_inode(c, f);

  release_sem:
	up(&c->alloc_sem);

It seems that if there is an error in jffs2_gc_fetch_inode, the
function could return without releasing the semaphore.  Is this a
bug, or is there more to this error case than meets the eye?  And
is it at all likely that we could have hit this error case?

-- 
Ben Gamsa             ben at somanetworks.com
SOMA Networks, Inc.   312 Adelaide St. W. Suite 700 Toronto, Ontario, M5V1R2