[JFFS2] Commit "jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces another dead lock.

Thu Aug 22 07:18:59 EDT 2013

Hi, all:

I've been working with 2.6.34 stable kernel and recently encountered a 
AB-BA dead lock issue with jffs2, the scenario is:

Run two scripts at the same time:

Script 1:

#!/bin/bash

while [ 1 ]

do

cp /mnt/mtd-folder/region_a/xxx.tar.gz /mnt/mtd-folder/region_b

usleep 10

done

Script 2:

#!/bin/bash

while [ 1 ]

do

tar -zxvf /mnt/mtd-folder/region_b/.tar.gz -C /dev/shm

done

In several hours, the processes "cp", "tar" and "jffs2_gcd_mtd" all turn 
to "D" state. After some investigation, I found that it's introduced by 
commit "jffs2: Fix lock acquisition order bug in jffs2_write_begin", 
which tried to fix a AB-BA dead lock as:

jffs2_garbage_collect_live

mutex_lock(&f->sem) (A)

jffs2_garbage_collect_dnode

     jffs2_gc_fetch_page

         read_cache_page_async

             do_read_cache_page

lock_page(page)             (B)

jffs2_write_begin

grab_cache_page_write_begin

     find_lock_page

lock_page(page)                     (B)

mutex_lock(&f->sem) (A)

But for do_generic_file_read()  first acquires the page lock, then 
f->sem,causes another AB-BA deadlock with jffs2_write_begin(), which 
firstacquires f->sem, then the page lock:

jffs2_write_begin

mutex_lock(&f->sem) (A)

grab_cache_page_write_begin

     find_lock_page

lock_page(page)                     (B)

do_generic_file_read

lock_page_killable(page) (B)

     jffs2_readpage

mutex_lock(&f->sem)                     (A)

I also noticed there was another thread discussed a similar deadlock 
also related to the same commit, with the title: "[JFFS2]The patch 
"jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces 
another dead lock bug.", posted by Deng Chao. And Deng had proposed a 
idea that involving in a function "read_cache_page_async_trylock" 
instead of "read_cache_page_async", is there anybody has implement that?

the best,
thank you