Another JFFS2 deadlock, kernel 3.4.11
deng.chao1 at zte.com.cn
deng.chao1 at zte.com.cn
Wed Nov 11 18:26:12 PST 2015
Hi all:
My patch makes jffs2_garbage_collect_pass return 0 NOT error when it can not get page lock. This means "try again".
No matter where jffs2_garbage_collect_pass is called, it will always loop until it gets its goal.
However wangzaiwei's doubt is somehow reasonable. jffs2_garbage_collect_pass is not only called by gc thread but also by jffs2_reserve_space, this will introduce a living lock in a rare situation.
Consider this:
The disk is almost full, this means jffs2_reserve_space may call jffs2_garbage_collect_pass to get free space when performing write operation.
Then Thread A has acquired the page lock when it is now writing, and its priority is low.
Thread B is a rt thread, and its priority is higher than A, is also writing. If B preempts A when A is holding the page lock, and in the same time B calls jffs2_reserve_space->jffs2_collect_pass to acquire the page lock, living lock occurs: B will always loop to wait A to release the page lock which is preemptted by B itself.
To solve this, I make jffs2_reserve_space to sleep a while when it finds jffs2_garbage_collect_pass cannot fetch the page lock.
Still,I agree with Thomas that my patch is too heavy. It will be much better if we find way to just modify jffs2_garbage_collect_pass to avoid the original deadlock.
But I think the fix is too tricky to me, I have not got any idea yet.
Thanks
Dengchao
Thomas.Betker at rohde-schwarz.com
2015-11-11 20:20 To
wangzaiwei <wangzaiwei at top-vision.cn>,
cc
Deng Chao <deng.chao1 at zte.com.cn>, Joakim Tjernlund <joakim.tjernlund at transmode.se>, 'Li Jiaxin' <lijiaxin at top-vision.cn>, linux-mtd <linux-mtd at lists.infradead.org>, linux-mtd <linux-mtd-bounces at lists.infradead.org>, Ming Liu <liu.ming50 at gmail.com>, lizhenwei <lizhenwei at top-vision.cn>
Subject
Re: Another JFFS2 deadlock, kernel 3.4.11
Hello wangzaiwei:
> > Deng Chao has created a patch which a) removes the deadlock I wanted
to
> > get rid of originally, without b) introducing the new deadlocks; see
> > http://lists.infradead.org/pipermail/linux-mtd/2013-August/048352.html
.
> > However, his patch modifies mm/filemap.c, and we were hoping to find a
> > more light-weight solution -- which never came to be.
>
> > I do use his patch here around, though, and so far, it has worked
fine. I
> > will try to run your test scripts on one of our devices, and see if it
> > holds up.
I have run your scripts on my device (with Deng Chao's patch) for three
hours. None of the scripts got into state 'D', and the system is still
alive (2 CPUs, so if a deadlock had occurred, the system would have
stopped dead).
> Though I didn't know about that Deng Chao and Ming Liu had reported the
issue,
> I have had the same patch thinking.
>
> Yes,these deadlock issues which we have found always occured between
> gc thread(may
> actived by sync system call) and other user tasks
>
> gc thread just like :
> > for [sync_supers]
> > jffs2_garbage_collect_live
> > mutex_lock(&f->sem) (A)
> > jffs2_garbage_collect_dnode
> > jffs2_gc_fetch_page
> > read_cache_page_async
> > do_read_cache_page
> > lock_page(page) (B)
>
>
> if we change lock_page(page) above to lock_page_try(page),deadlock
> will go away.
>
> But i worry about this workaround. jffs2_garbage_collect_live action
> will changed.
> and jffs2_garbage_collect_live is called not only by gc thread.
> Is it ok to return an error rather than blocking .
> Can syscall 'sync' still reach its goal ?
I think that Deng Chao has discussed this in his patch. We have been using
the patch for almost two years now, and I didn't see any bad effects yet.
Best regards,
Thomas Betker
More information about the linux-mtd
mailing list