JFFS2 deadlock with alloc_sem
Josh Boyer
jwboyer at linux.vnet.ibm.com
Thu Jun 7 10:29:59 EDT 2007
On Sat, 2007-06-02 at 18:42 +0100, David Woodhouse wrote:
> On Mon, 2007-04-30 at 15:41 -0400, Roberts Nathan-mcg31137 wrote:
> > Has anyone seen this deadlock before? It seems to be a classic deadlock
> > situation so I'm not sure if maybe I'm misinterpreting things or
> > the use case (several postmark tests running in parallel on a
> > preemptible kernel) is especially vulnerable.
>
> I think Josh has spotted the real problem here. Does this help? If so,
> as better fix will be forthcoming....
Hm... so the deadlock I was discussing looks a bit different than the
one Nathan reported. There, we were thinking that the race was between
page locking and the locking of f->sem by GC. Below are the stack
traces:
PID: 75 TASK: c22b0000 CPU: 0 COMMAND: "jffs2_gcd_mtd17"
#0 [c22b1dc0] crash_save_current_state at c00190d4
#1 [c22b1e00] __lock_page at c0038be0
#2 [c22b1e30] read_cache_page at c003b5dc
#3 [c22b1e60] jffs2_garbage_collect_dnode at c00960fc
#4 [c22b1f10] jffs2_garbage_collect_pass at c00951b0
#5 [c22b1f50] jffs2_garbage_collect_thread at c0097994
#6 [c22b1ff0] original_kernel_thread at c00055ac
PID: 17169 TASK: c171a000 CPU: 0 COMMAND: "application"
#0 [c171bdf0] crash_save_current_state at c00190d4
#1 [c171be30] __down at c0008150
#2 [c171be60] jffs2_readpage at c008f658
#3 [c171be80] do_generic_file_read at c00394fc
#4 [c171bed0] generic_file_read at c0039b28
#5 [c171bf10] sys_read at c0048bd0
#6 [c171bf30] ret_from_syscall_1 at c0002b48
syscall [c00] exception frame:
R0: 00000003 R1: 7f5ff1b0 R2: 00000000 R3: 0000000e
R4: 3002a008 R5: 00020000 R6: 1002d8a0 R7: 00000001
R8: 0000002c R9: 0fdcd918 R10: 7f5ffc00 R11: 7f5fffff
R12: 28428884 R13: 1001ba28 R14: 0000439a R15: 00fbce5c
R16: 80a00701 R17: 00000000 R18: 7f5ff3d0 R19: 7f5ff71c
R20: 7f5ff390 R21: 00000000 R22: 3002a008 R23: 7f5ff370
R24: 7f5ff2f0 R25: 3002a008 R26: 00020000 R27: 00020000
R28: 3002a008 R29: 0000000e R30: 0fb8ef24 R31: 00000000
NIP: 0fb1e310 MSR: 0002d030 OR3: 0000000e CTR: 0faccd0c
LR: 0fdb48cc XER: 20000000 CCR: 28424884 MQ: 00000000
DAR: 3002a000 DSISR: 00800000 Syscall Result: 00000000
Switching to user space stack (no more symbol info).
What I can't immediately determine is if it's just a general race
between GC and reading/writing threads when it comes to locking the
pages before locking alloc_sem and/or f->sem, or if these are two
separate races.
The patch you had should fix the above condition, but will it aggravate
the condition Nathan reported?
/me is now confused
josh
More information about the linux-mtd
mailing list