hard hang in jffs2 detected

Wed May 30 15:26:07 EDT 2007

Subject: deadlock in jffs2
Date: May 30, 2007

Hi y'all,

I have a kernel crash dump that shows a deadlock in jffs2. It's the case
where locks are obtained in backward order for two different processes.
Here is the time line:

1. Process B is reading a file. Due to a missing or dirty VFS cache page,
generic_file_read obtains a page lock. Switch.

2. Process A is the jffs2 GC thread. It has come across a valid dnode, so
it calls iget for the inode, then obtains (downs) the jffs2_inode_info.sem
(f->sem). Switch.

3. Process B now calls from generic VFS code into jffs2 code at
jffs2_readpage which blocks on the down of f->sem owned by Process A. 
Switch.

4. Process A in jffs2_garbage_collect_dnode() makes a call to
read_cache_page() and blocks because Process B has it. Deadlock.

Thus Process B obtaining order is page lock, then f->sem. Process A 
obtaining order is f->sem, then page lock.

This deadlock has been discussed before, but I don't see a 
fix in the upstream.

Regarding a fix, I'm leaning towards the removal of the 
read_cache_page() interaction in jffs2_garbage_collect_dnode() and 
replacing with a more direct read of the flash? 

I would greatly appreciate a fix and/or suggestions for a fix. Comments
please. Thanks.

Monte Copeland
Austin TX
catboat at texas.net
copelanm at us.ibm.com