JFFS2 deadlock with alloc_sem

Dave Kleikamp shaggy at linux.vnet.ibm.com
Tue Jul 31 09:23:49 EDT 2007


On Tue, 2007-07-31 at 13:10 +0100, David Woodhouse wrote:
> On Mon, 2007-07-30 at 11:45 -0500, Dave Kleikamp wrote:
> > Thus we conclude that the root cause of the problem is that jffs2 is not 
> > conforming to the strict order of acquiring multiple locks, ie., all code 
> > paths resulting in acquiring multiple locks must do so in the same order. 
> > In this case, gc thread requests first the file lock, then the page lock, 
> > however jffs2_readpage function requests the page lock first, then the file 
> > lock. Another potential deadlock source is in jffs2_prepare_write, in which it 
> > requests page lock, then the file lock. 
> 
> If that's the explanation, then the patch which Nathan tried (dropping
> f->sem before jffs2_gc_fetch_page(), followed by your cleanups¹) ought
> to have fixed the problem. And I'd be happier with that version rather
> than introducing a new read_cache_page_async_trylock() solely for JFFS2.
> 
> It's actually OK to drop f->sem in jffs2_garbage_collect_dnode(). We
> hold the alloc_sem anyway -- nobody's going to be _changing_ the file
> under us. In fact, the garbage collector probably doesn't need to grab
> f->sem until it's actually going to _change_ something.

We had tried a similar patch, attached here, but it caused problems.
Maybe our patch is missing something.

>From the bug report:
-----------------------------
Built and ran the 2nd patch (attachment 28493 [edit]).  Results are similar
as before, jffs2 runs for a little while, but soon complains there's 
already data at the point where it intends to write. 

ARGH. About to write node to 0x00140010 on flash, but there's data already 
there:
0x00140010: 19 85 e0 02 00 00 00 ac 0a 3e 48 74 00 00 00 74
argh. node added in wrong place
Node totlen on flash (0x00000004) != totlen in node ref (0x000000ac)
ARGH. About to write node to 0x00140010 on flash, but there's data already 
there:
0x00140010: 19 85 e0 02 00 00 00 04 08 34 00 74 00 00 00 74
argh. node added in wrong place
ARGH. About to write node to 0xc01a4600 on flash, but there's data already 
there:
0xc01a4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Write of 324 bytes at 0xc01a4600 failed. returned 0, retlen 0
Not marking the space at 0xc01a4600 as dirty because the flash driver returned 
retlen zero

It appeared to deadlock here, and after a few minutes the unit check timer 
stepped in a rebooted the system. 
-- 
David Kleikamp
IBM Linux Technology Center
-------------- next part --------------
A non-text attachment was scrubbed...
Name: up_down_3.patch
Type: text/x-patch
Size: 1553 bytes
Desc: not available
Url : http://lists.infradead.org/pipermail/linux-mtd/attachments/20070731/f2c1fd1d/attachment-0001.bin 


More information about the linux-mtd mailing list