[JFFS2]The patch "jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces another dead lock bug.

Thomas.Betker at rohde-schwarz.com Thomas.Betker at rohde-schwarz.com
Fri Aug 23 17:42:49 EDT 2013


Hello Chao:

> Yes,my kernel actually had hit this dead lock.
> 
> My kernel's version is 2.6.32. After i merged the patch into my 
> kernel, i run my own filesystem test suite "fstess" on the kernel. 
> About 9 hours later, i found that the kernel didn't reponse any file
> operation at all. So i used magic SysRq key to show state of all 
> tasks. Finally i found this two threads' stacktraces in the SysRq 
> output as below which could prove the dead lock:

When creating the patch, I was assuming that jffs2_write_begin() and 
jffs2_write_end() could not be called at the same time (for the same 
page). Obviously, I was wrong.

Taking also into account the report by Ming Liu, I have come to the 
conclusion that page_lock() must be called _before_ c->alloc_sem or f->sem 
is acquired. In other words, my patch was a mistake; it's not 
jffs2_write_begin() that should be fixed, but 
jffs2_garbage_collect_live().

The short way out is to revert my patch, which is fine by me. However, 
this still leaves us with the original issue, i.e., the deadlock between 
jffs2_garbage_collect_live() and jffs2_write_begin(). Unfortunately, 
jffs2_garbage_collect_live() grabs f->sem right at the beginning, while 
page_lock() or __set_page_locked() is called several layers deep within 
the code. So I don't think this is an easy job.

Unfortunately, I won't be at the office in the next three weeks, so there 
isn't much I can do at the moment. If you are going to provide a fix, I 
will gladly test it when I am back.

Best regards,
Thomas




More information about the linux-mtd mailing list