JFFS2 deadlock

Joakim Tjernlund Joakim.Tjernlund at infinera.com
Wed Jan 27 08:05:35 PST 2016


On Wed, 2016-01-27 at 16:36 +0100, Szabó Tamás wrote:
> Hello all,
> 
> I work on an embedded system running Linux 3.10 and found a deadlock
> situation between jffs2_readpage and jffs2_write.
> The problem is present on the latest 4.4 kernel too and occurs when
> two tasks want to access the same file, one reads and the other writes it.
> 
> The kernel stack traces for writer and reader in deadlock:
> 
> __switch_to+0x4c/0x98
> sleep_on_page+0x10/0x24
> __lock_page+0x8c/0x9c
> find_lock_page+0x7c/0x94
> grab_cache_page_write_begin+0x64/0xd8
> jffs2_write_begin+0x6c/0x2ec
> generic_file_buffered_write+0x188/0x258
> __generic_file_aio_write+0x1e0/0x484
> generic_file_aio_write+0x70/0xfc
> do_sync_write+0x7c/0xd4
> vfs_write+0xc8/0x1b0
> SyS_write+0x4c/0xa8
> ret_from_syscall+0x0/0x38
> 
> __switch_to+0x4c/0x98
> jffs2_readpage+0x28/0x5c
> generic_file_aio_read+0x22c/0x7a0
> do_sync_read+0x7c/0xd4
> vfs_read+0xb0/0x170
> SyS_read+0x4c/0xa8
> ret_from_syscall+0x0/0x38
> 
> The root cause here is the locking order of f->sem mutex and pagelock.
> jffs2_readpage function gets the page in locked state and then locks
> the f->sem mutex, while jffs2_write_begin does it in reverse order.
> 
> I found a commit that brought in this bug.
> That was a fix for another deadlock issue:
> https://github.com/torvalds/linux/commit/5ffd3412ae5536a4c57469cb8ea31887121dcb2e
> 
> According to this commit and my code inspections the lock orders may be
> the following:
> readpage: page lock, f->sem
> writepage_begin: f->sem, page lock
> writepage_end: page lock, f->sem
> GC: f->sem, page lock

I am not sure if this is the first time I hear this or if someone else has reported
a similar issue. Based on you observations above you could revert the referenced commit
and fix the locking order in GC to be page lock, f->sem. Does that seem doable?

 Jocke




More information about the linux-mtd mailing list