[JFFS2] Revision "jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces another dead lock.
Ming Liu
liu.ming50 at gmail.com
Fri Aug 23 05:04:01 EDT 2013
Hi, all:
I've been working with 2.6.34 stable kernel and recently encountered a
AB-BA dead lock issue with jffs2, the scenario is:
Run two scripts at the same time:
Script 1:
#!/bin/bash
while [ 1 ]
do
cp /mnt/mtd-folder/region_a/xxx.tar.gz /mnt/mtd-folder/region_b
usleep 10
done
Script 2:
#!/bin/bash
while [ 1 ]
do
tar -zxvf /mnt/mtd-folder/region_b/.tar.gz -C /dev/shm
done
In several hours, the processes "cp", "tar" and "jffs2_gcd_mtd" all turn
to "D" state. After some investigation, I found that it's introduced by
commit "jffs2: Fix lock acquisition order bug in jffs2_write_begin",
which tried to fix a AB-BA dead lock as:
jffs2_garbage_collect_live
mutex_lock(&f->sem) (A)
jffs2_garbage_collect_dnode
jffs2_gc_fetch_page
read_cache_page_async
do_read_cache_page
lock_page(page) (B)
jffs2_write_begin
grab_cache_page_write_begin
find_lock_page
lock_page(page) (B)
mutex_lock(&f->sem) (A)
But for do_generic_file_read() first acquires the page lock, then
f->sem,causes another AB-BA deadlock with jffs2_write_begin(), which
firstacquires f->sem, then the page lock:
jffs2_write_begin
mutex_lock(&f->sem) (A)
grab_cache_page_write_begin
find_lock_page
lock_page(page) (B)
do_generic_file_read
lock_page_killable(page) (B)
jffs2_readpage
mutex_lock(&f->sem) (A)
I also noticed there was another thread discussed a similar deadlock
also related to the same commit, with the title: "[JFFS2]The patch
"jffs2: Fix lock acquisition order bug in jffs2_write_begin" introduces
another dead lock bug.", posted by Deng Chao. And Deng had proposed a
idea that involving in a function "read_cache_page_async_trylock"
instead of "read_cache_page_async", is there anybody has implement that?
the best,
thank you
More information about the linux-mtd
mailing list